Sägetstrasse 18, 3123 Belp, Switzerland +41 79 173 36 84 info@ict.technology

    Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

    In the last part of this series, we showed how seemingly harmless data sources in Terraform modules can become a serious performance issue. Multi-minute terraform plan runtimes, unstable pipelines and uncontrollable API throttling effects were the result.

    But how can you avoid this scalability trap in an elegant and sustainable way?

    In this part, we present proven architectural patterns that allow you to centralize data sources, inject them efficiently and thereby achieve fast, stable and predictable Terraform executions even with hundreds of module instances.

    Included: three scalable solution strategies, a practical step-by-step guide and a best practices checklist for production-ready infrastructure modules.

     

    Best Practice: Scalable Alternatives

    Solution 1 (simple scenarios): Variable Injection Pattern

    Instead of using data sources in modules, inject the required data as variables:


     data "oci_identity_availability_domains" "available" {
      compartment_id = var.tenancy_ocid
    }
    
    data "oci_core_subnets" "database" {
      compartment_id = var.compartment_id
      vcn_id         = var.vcn_id
      
      filter {
        name   = "display_name"
        values = ["*database*"]
      }
    }
    
    locals {
      availability_domains = data.oci_identity_availability_domains.available.availability_domains
      database_subnets     = data.oci_core_subnets.database.subnets
    }
    
    module "databases" {
      for_each = var.database_configs != null ? var.database_configs : {}
      
      source = "./modules/database"
      
      availability_domains = local.availability_domains
      subnet_ids          = [for subnet in local.database_subnets : subnet.id]
      
      name = each.key
      size = each.value.size
    }
    

     


     variable "availability_domains" {
      type = list(object({
        name = string
        id   = string
      }))
      description = "Available ADs for database placement"
    }
    
    variable "subnet_ids" {
      type        = list(string)
      description = "Database subnet IDs"
    }
    
    resource "oci_database_db_system" "main" {
      for_each = var.db_systems != null ? var.db_systems : {}
      
      availability_domain = var.availability_domains[0].name
      subnet_id          = var.subnet_ids[0]
      compartment_id     = var.compartment_id
    }
    

     

    Solution 2 (complex scenarios): Structured Configuration Pattern

    For more complex scenarios, use structured configuration objects:


    data "oci_core_images" "ol8" {
      compartment_id           = var.tenancy_ocid
      operating_system         = "Oracle Linux"
      operating_system_version = "8"
    }
    
    locals {
      compute_images = {
        "VM.Standard.E4.Flex" = {
          image_id         = [for img in data.oci_core_images.ol8.images : img.id if can(regex(".*E4.*", img.display_name))][0]
          boot_volume_size = 50
        }
        "BM.Standard3.64" = {
          image_id         = [for img in data.oci_core_images.ol8.images : img.id if can(regex(".*Standard.*", img.display_name))][0]
          boot_volume_size = 100
        }
      }
      
      network_config = {
        availability_domains = data.oci_identity_availability_domains.ads.availability_domains
        vcn_id              = data.oci_core_vcn.main.id
      }
    }
    
    module "compute_instances" {
      for_each = var.instance_configs != null ? var.instance_configs : {}
      
      source = "./modules/compute-instance"
      
      compute_config = local.compute_images[each.value.shape]
      network_config = local.network_config
    }

     


    variable "compute_config" {
      type = object({
        image_id         = string
        boot_volume_size = number
      })
      description = "Pre-resolved compute configuration"
    }
    
    variable "network_config" {
      type = object({
        availability_domains = list(object({
          name = string
          id   = string
        }))
        vcn_id = string
      })
      description = "Pre-resolved network configuration"
    }
    
    resource "oci_core_instance" "this" {
      for_each = var.instances != null ? var.instances : {}
      
      availability_domain = var.network_config.availability_domains[0].name
      compartment_id      = var.compartment_id
      
      source_details {
        source_id               = var.compute_config.image_id
        source_type            = "image"
        boot_volume_size_in_gbs = var.compute_config.boot_volume_size
      }
    }

    Solution 3 (very complex scenarios): Data Proxy Pattern

    For very complex scenarios, create dedicated "Data Proxy" modules:


    data "oci_core_images" "oracle_linux" {
      compartment_id           = var.tenancy_ocid
      operating_system         = "Oracle Linux"
      operating_system_version = "8"
    }
    
    data "oci_core_vcn" "main" {
      vcn_id = var.vcn_id
    }
    
    data "oci_core_security_lists" "web" {
      compartment_id = var.compartment_id
      vcn_id         = var.vcn_id
      
      filter {
        name   = "display_name"
        values = ["*web*"]
      }
    }
    
    output "platform_data" {
      value = {
        image_id = data.oci_core_images.oracle_linux.images[0].id
        vcn_id   = data.oci_core_vcn.main.id
        
        instance_shapes = {
          small  = "VM.Standard.E3.Flex"
          medium = "VM.Standard.E4.Flex"
          large  = "VM.Standard3.Flex"
        }
      }
    }

     


    module "platform_data" {
      source = "./modules/data-proxy"
      
      tenancy_ocid   = var.tenancy_ocid
      compartment_id = var.compartment_id
      vcn_id         = var.vcn_id
    }
    
    module "web_servers" {
      for_each = var.web_server_configs != null ? var.web_server_configs : {}
      
      source = "./modules/oci-instance"
      
      platform_data = module.platform_data.platform_data
      
      name          = each.key
      instance_type = each.value.size
    }

    Performance Comparison

    A concrete example from a customer project deploying 50 VM instances illustrates the dramatic difference:

     
     Before: Data Sources in Modules

    After: Variable Injection

    Number of API Calls
    150 API calls 3 API calls
    Time Required
    $ time terraform plan
    ...
    Plan: 50 to add, 0 to change, 0 to destroy.
    real 4m23.415s
    user 0m12.484s
    sys 0m2.108s
    $ time terraform plan
    ...
    Plan: 50 to add, 0 to change, 0 to destroy.
    real 0m18.732s
    user 0m8.234s
    sys 0m1.456s

     

    Result: 93% less planning time and 98% fewer API calls.

    Variable Injection: Step-by-Step Guide

    Step 1: Centralize Data Sources

    Goal: Remove all data sources from modules and centralize them in the root module to consolidate API calls and establish a single source of truth.

    How: Move all data sources used by modules into the root module. This ensures that each piece of information is queried only once, regardless of how many modules require that data. In doing so, you reduce the number of API calls from N×M (number of modules × number of data sources) to just M (number of data sources).


    data "oci_identity_availability_domains" "ads" {
      compartment_id = var.tenancy_ocid
    }
    
    data "oci_core_images" "ol8" {
      compartment_id           = var.tenancy_ocid
      operating_system         = "Oracle Linux"  
      operating_system_version = "8"
    }
    
    data "oci_core_vcn" "main" {
      vcn_id = var.vcn_id
    }

     

    Step 2: Process Data in Locals

    Goal: Transform raw data source results into a consumable format while keeping complexity out of the modules.

    How: Use locals to filter, sort and convert data source results into structured data formats. This allows you to handle complex logic centrally and supply modules with already processed, clean data. With for-loops and conditional expressions, you can also implement fallback mechanisms and validation logic at the same time.


    locals {
      availability_domains = [
        for ad in data.oci_identity_availability_domains.ads.availability_domains : ad.name
      ]
      
      compute_images = {
        standard = [
          for img in data.oci_core_images.ol8.images :
          img.id if can(regex(".*Standard.*", img.display_name))
        ][0]
        
        gpu = [
          for img in data.oci_core_images.ol8.images :
          img.id if can(regex(".*GPU.*", img.display_name))  
        ][0]
      }
      
      network_config = {
        vcn_id               = data.oci_core_vcn.main.id
        vcn_cidr            = data.oci_core_vcn.main.cidr_block
        availability_domains = local.availability_domains
      }
    }

     

    Step 3: Define Variables in Modules

    Goal: Create clear interfaces for passing data to modules while ensuring type safety and validation.

    How: Replace data sources in modules with typed variables that include descriptive documentation and validation rules. The type definitions ensure consistency and robustness, while validation blocks guarantee that only valid data is passed into the modules. This makes modules more testable and independent from the cloud provider API.


    variable "availability_domains" {
      type        = list(string)
      description = "List of available availability domains"
      
      validation {
        condition     = length(var.availability_domains) > 0
        error_message = "At least one availability domain must be provided."
      }
    }
    
    variable "compute_images" {
      type        = map(string)
      description = "Map of compute images by type"
      
      validation {
        condition = alltrue([
          for image_id in values(var.compute_images) :
          can(regex("^ocid1\\.image\\.", image_id))
        ])
        error_message = "All image IDs must be valid OCI OCIDs."
      }
    }

     

    Step 4: Implement Modules Without Data Sources

    Goal: Completely decouple modules from external API calls and turn them into pure resource definition containers.

    How: Replace all data source references in modules with variable references. This makes modules deterministic and predictable, as they only operate on the parameters passed in and do not make any unexpected API calls. At the same time, it makes modules independently testable, since you can inject mock data through the variables.


    resource "oci_core_instance" "this" {
      for_each = var.instances != null ? var.instances : {}
      
      availability_domain = var.availability_domains[each.value.ad_index]
      compartment_id      = var.compartment_id
      shape              = each.value.shape
      
      create_vnic_details {
        subnet_id = each.value.subnet_id
      }
      
      source_details {
        source_id   = var.compute_images[each.value.image_type]
        source_type = "image"
      }
      
      metadata = {
        ssh_authorized_keys = var.ssh_public_key
      }
    }

     

    Step 5: Call Modules with Injected Data

    Goal: Establish the connection between centrally retrieved data and modules to implement a clean data flow pattern.

    How: Pass the data processed in locals as parameters to the modules. This completes the variable injection loop: data is retrieved centrally once, processed and then explicitly distributed to the modules. This explicit data transfer creates clear dependencies that are understandable both for humans and for Terraform itself.


    module "web_servers" {
      for_each = var.web_server_configs != null ? var.web_server_configs : {}
      
      source = "./modules/compute"
      
      availability_domains = local.availability_domains
      compute_images      = local.compute_images  
      network_config      = local.network_config
      
      instances      = each.value.instances
      compartment_id = each.value.compartment_id
      ssh_public_key = var.ssh_public_key
    }

     

    Best Practices Checklist

    ✅ Do's: Scalable Patterns

    • [ ] Central Data Sources: Define all data sources in the root module
    • [ ] Variable Injection: Pass data to modules via variables
    • [ ] Structured Objects: Organize complex data in typed objects
    • [ ] Validation Rules: Implement variable validations for injected data
    • [ ] Documentation: Write variable descriptions for injected data
    • [ ] Local Processing: Process data in locals in the root module
    • [ ] Data Proxy Pattern: Use separate data modules for very complex scenarios

    ❌ Don'ts: Avoid Anti-Patterns

    • [ ] Data Sources in Modules: Never use data sources in reusable modules
    • [ ] Redundant Lookups: Identical data sources in multiple modules
    • [ ] Complex Filtering: Costly filter operations in every module
    • [ ] Nested Data Sources: Data sources depending on other data sources
    • [ ] Dynamic References: for_each on data source results within modules
    • [ ] Missing Validation: Using injected data without validation

    Monitoring and Debugging

    To monitor data source performance, you can search and evaluate the debug output of terraform plan for data source entries:


    export TF_LOG=DEBUG
    export TF_LOG_PATH=./terraform.log
    
    terraform plan 2>&1 | grep -E "(data\.|GET|POST)" | wc -l
    
    terraform plan 2>&1 | grep -E "data\." | awk '{print $2}' | sort | uniq -c

    Conclusion: Performant Modules Through Intentional Architecture

    Data sources are a powerful Terraform feature - but in modules, they can become a performance trap. The variable injection pattern offers an elegant solution:

    Advantages:

    • Drastically reduced API calls (95%+ savings possible)
    • Linear performance scaling instead of exponential degradation
    • Centralized data logic for better maintainability
    • Explicit dependencies instead of hidden data source calls
    • Better testability through injectable mock data

    The key lies in a paradigm shift: Instead of fetching data when needed, fetch it once centrally and distribute it in a targeted manner.

    At ICT.technology, we have reduced Terraform planning times from minutes to seconds - even with hundreds of module instances - by consistently applying these patterns.