Sägetstrasse 18, 3123 Belp, Switzerland +41 79 173 36 84 info@ict.technology

      Terraform @ Scale - Part 1e: Scaling Across Organizational Boundaries

      Managing Terraform infrastructure becomes particularly challenging when it spans multiple business units or even different customer organizations.
      In such scenarios, it is no longer sufficient to simply set up individual workspaces or pipelines in a technically clean manner. Instead, decision-makers, CTOs, architects, and senior engineers require clearly structured responsibilities, strict governance, and fully automated processes to ensure consistency, security, and efficiency. We have already discussed the separation of states in detail, but let us briefly summarize the key points once again.

      Team Structures and Responsibilities

      The overview of the entire infrastructure can ideally be divided into at least three areas of responsibility:

      Platform Engineering Team

      The Platform Engineering Team manages the root tenancies as well as all central cloud accounts and compartments of the company, thereby creating the organizational framework for all downstream activities.

      Additionally, the team creates, versions, and maintains the base modules that all other teams use to ensure a uniform technical foundation.

      At the same time, it defines binding architectural guidelines, tagging standards, and naming conventions - rules that ensure consistency and transparency across the company.

      Finally, the Platform Engineering Team operates and hardens the Terraform backend and the CI/CD platform to guarantee stable, scalable, and audit-proof operations.

      Service Teams

      Service Teams use the provided base modules and develop domain-specific or product-specific service modules that are precisely tailored to their fields.

      They take responsibility for the infrastructure of their business units and directly transform business requirements into Infrastructure-as-Code artifacts.

      They conveniently obtain the required modules through a private registry, Git submodules, or established package managers, which ensures consistent access to current versions.

      Application Teams

      Application Teams provision application-specific resources based on the service modules, allowing them to fully concentrate on the added value of their software.

      They integrate infrastructure and code into seamless deployment pipelines, enabling releases to be reproducible, auditable, and automated.

      Despite their high level of autonomy, they commit to adhering to all centrally defined guidelines to ensure security and governance across the entire platform.

      Trust Zones and State Isolation

      Each organizational level has its own trust zone. Backend configurations, IAM policies, and CI/CD pipelines ensure that teams can access only those states for which they are responsible. This prevents, for example, an application team from unintentionally altering platform resources.

      Governance and Compliance

      Policy-as-Code

      HashiCorp Sentinel or Open Policy Agent enable the enforcement of binding rules as code, thereby creating an automated control instance throughout the entire infrastructure lifecycle.


      import "tfplan/v2" as tfplan
      
      mandatory_tags = ["Owner","CostCenter","Environment"]
      
      validate_instances = rule {
        all tfplan.resource_changes as _, rc {
          rc.type is "oci_core_instance" and rc.change.actions contains "create" implies
            all mandatory_tags as tag { rc.change.after.defined_tags[tag] is not null }
        }
      }
      
      main = rule { validate_instances }
      

      Compliance Automation

      The automatic generation of compliance documentation directly from Terraform outputs reduces manual effort and creates audit evidence that is traceable at any time.

      Additionally, regular scans of the productive infrastructure perform a continuous comparison with the declared state and uncover deviations at an early stage.

      Multi-Account Strategies

      Separate cloud accounts increase the isolation between tenants, but require disciplined credential management and adjusted CI/CD pipelines so that operations and security do not suffer. Therefore, use Terraform workspaces along with provider aliases as demonstrated here:


      provider "oci" {
        alias            = "tenant_a"
        tenancy_ocid     = var.tenant_a_ocid
        user_ocid        = var.tenant_a_user_ocid
        fingerprint      = var.tenant_a_fingerprint
        private_key_path = var.tenant_a_private_key_path
        auth_type        = "api_key"
        region           = var.tenant_a_region
      }
      
      provider "oci" {
        alias        = "tenant_b"
        tenancy_ocid = var.tenant_b_ocid
        auth_type    = "api_key"
      }

      CI/CD Integration

      A robust pipeline workflow fundamentally passes through the phases Validation, Planning, Approval, Apply and Verification, thus forming the foundation for reproducible and auditable deployments in multi-tenant environments.

      Dynamic Tenant Configuration

      Tenant information can be dynamically read from various sources. These are typically databases or directories, but they can also be simple YAML constructs or JSON. An example:


      tenants:
        - name: customer_a
          environment: production
          region: eu-frankfurt-1
          approval_required: true
        - name: internal_test
          environment: development
          region: eu-amsterdam-1
          approval_required: false

       

      Example Job for Policy Validation (GitLab)

      If you do not use Sentinel and instead rely on an external tool, you can simply integrate such a tool as an additional step into the pipeline, for example the Python script in this case:


      validate_policies:
        image: hashicorp/terraform:1.7
        stage: validate
        script:
          - terraform init
          - terraform plan -out=tfplan
          - terraform show -json tfplan | gzip > tfplan.json.gz
          - python scripts/validate_policies.py tfplan.json.gz
        artifacts:
          paths:
            - tfplan.json.gz

       

      Self-Service Portals

      A self-service portal with a curated module catalog allows business units to order standardized infrastructure with just a few clicks, while automated CI/CD pipelines ensure controlled provisioning in the background.

      Making the use of a self-service portal mandatory also improves the overall compliance of the company, because only the portal itself then requires the permission to provision at the level of the application teams. Anything the self-service portal does not provide can no longer be implemented either, which further reduces the risk of shadow IT and makes it easier to implement zero trust, as the application teams themselves are no longer considered a trusted organization. In this way, you treat your internal end users like customers.

      Important Features of Terraform Cloud and Enterprise

      For complex infrastructures, there is no way around Terraform Cloud or Terraform Enterprise. If you value sole data sovereignty, which is the standard within the European Union, or if you are subject to regulatory security requirements, then Terraform Enterprise is the only viable option.

      Workspace Management

      Separate workspaces per tenant, as supported by Terraform Enterprise and Terraform Cloud (not to be confused with the CLI command terraform workspace of the free version of Terraform, because that is something entirely different), ensure strict isolation and at the same time enable granular versioning and release processes per tenant.

      Tags and reusable variable sets significantly simplify configuration management and reduce duplication in large workspace landscapes.

      Team and Permission Management

      A finely graduated role matrix differentiates between read, write, and approval rights per workspace, supporting a clean separation of duties.

      The straightforward SSO integration connects Terraform Cloud or Enterprise to the existing company identity system and accelerates onboarding and offboarding processes.

      Run Triggers and Dependencies

      Changes in a central workspace can automatically trigger plans in dependent workspaces, ensuring consistent upgrades across multiple levels.

      Private Registry

      The private registry handles the central versioning of all modules, automatically publishes their documentation, and creates a single point of truth for reusable infrastructure components.

      Cost Estimation and Drift Detection

      An integrated cost estimation already provides reliable infrastructure cost estimates per tenant before the apply phase, allowing for precise budget planning. Cloud providers usually offer tools for this purpose, but even if a cloud provider does not natively support it, you can implement it through custom pricing tags. The values of these tags can come from sources like CSV files or a database. When prices change, updates can easily be made in the infrastructure components because custom tags are usually updatable and non-destructive. The real challenge lies in volume-based rates like network traffic, meaning hidden costs such as those commonly encountered with AWS.

      In parallel, drift detection continuously monitors deviations between the declared and actual state and immediately reports any anomalies before they become a problem. Drift detection is a feature of Terraform Enterprise and Terraform Cloud.

      Auditing and Revision Security

      AuditA scalable Terraform setup across multiple tenants not only requires structured responsibilities and seamless automation, but also comprehensive auditability of all changes to infrastructure resources.

      Terraform Cloud and Terraform Enterprise offer an integrated audit log that documents all relevant events - such as triggering a plan, changes to variables, team assignments, or executing a terraform apply - chronologically, immutably, and exportably. These logs can be integrated into external SIEM systems if necessary, to meet security policies, legal requirements, or industry-specific compliance standards such as ISO 27001, BSI C5, or SOC 2.

      For companies with particularly high traceability and revision security requirements, it is advisable to complement this with central log management based on solutions like Elastic Stack or Splunk. This allows not only activities at the Terraform level but also contextual information from GitLab, CI/CD pipelines, identity providers, and cloud APIs to be correlated and analyzed.

      Mechanisms that link changesets with ticketing systems (e.g. Jira) and a clear process for change approvals are particularly helpful. They ensure that every change is reviewed and approved by at least one authorized person. This is a central component of separation of duties and least privilege strategies.

      Conclusion

      Scaling across organizational boundaries demands more than pure Terraform expertise. Those who combine clearly defined team structures, consistent policy checks, automated compliance processes, and modern CI/CD pipelines with the multi-tenancy capabilities of Terraform Cloud or Enterprise create the foundation for secure, reproducible, and efficient infrastructure that reliably functions even across many tenants.