Terraform @ Scale - Part 1e: Scaling Across Organizational Boundaries

Details: Read Time: 8 mins; Created: 29 April 2025

Managing Terraform infrastructure becomes particularly challenging when it spans multiple business units or even different customer organizations.
In such scenarios, it is no longer sufficient to simply set up individual workspaces or pipelines in a technically clean manner. Instead, decision-makers, CTOs, architects, and senior engineers require clearly structured responsibilities, strict governance, and fully automated processes to ensure consistency, security, and efficiency. We have already discussed the separation of states in detail, but let us briefly summarize the key points once again.

Team Structures and Responsibilities

The overview of the entire infrastructure can ideally be divided into at least three areas of responsibility:

Platform Engineering Team

The Platform Engineering Team manages the root tenancies as well as all central cloud accounts and compartments of the company, thereby creating the organizational framework for all downstream activities.

Additionally, the team creates, versions, and maintains the base modules that all other teams use to ensure a uniform technical foundation.

At the same time, it defines binding architectural guidelines, tagging standards, and naming conventions - rules that ensure consistency and transparency across the company.

Finally, the Platform Engineering Team operates and hardens the Terraform backend and the CI/CD platform to guarantee stable, scalable, and audit-proof operations.

Service Teams

Service Teams use the provided base modules and develop domain-specific or product-specific service modules that are precisely tailored to their fields.

They take responsibility for the infrastructure of their business units and directly transform business requirements into Infrastructure-as-Code artifacts.

They conveniently obtain the required modules through a private registry, Git submodules, or established package managers, which ensures consistent access to current versions.

Application Teams

Application Teams provision application-specific resources based on the service modules, allowing them to fully concentrate on the added value of their software.

They integrate infrastructure and code into seamless deployment pipelines, enabling releases to be reproducible, auditable, and automated.

Despite their high level of autonomy, they commit to adhering to all centrally defined guidelines to ensure security and governance across the entire platform.

Trust Zones and State Isolation

Each organizational level has its own trust zone. Backend configurations, IAM policies, and CI/CD pipelines ensure that teams can access only those states for which they are responsible. This prevents, for example, an application team from unintentionally altering platform resources.

Governance and Compliance

Policy-as-Code

HashiCorp Sentinel or Open Policy Agent enable the enforcement of binding rules as code, thereby creating an automated control instance throughout the entire infrastructure lifecycle.

import "tfplan/v2" as tfplan

mandatory_tags = ["Owner","CostCenter","Environment"]

validate_instances = rule {
  all tfplan.resource_changes as _, rc {
    rc.type is "oci_core_instance" and rc.change.actions contains "create" implies
      all mandatory_tags as tag { rc.change.after.defined_tags[tag] is not null }
  }
}

main = rule { validate_instances }

Compliance Automation

The automatic generation of compliance documentation directly from Terraform outputs reduces manual effort and creates audit evidence that is traceable at any time.

Additionally, regular scans of the productive infrastructure perform a continuous comparison with the declared state and uncover deviations at an early stage.

Multi-Account Strategies

Separate cloud accounts increase the isolation between tenants, but require disciplined credential management and adjusted CI/CD pipelines so that operations and security do not suffer. Therefore, use Terraform workspaces along with provider aliases as demonstrated here:

provider "oci" {
  alias            = "tenant_a"
  tenancy_ocid     = var.tenant_a_ocid
  user_ocid        = var.tenant_a_user_ocid
  fingerprint      = var.tenant_a_fingerprint
  private_key_path = var.tenant_a_private_key_path
  auth_type        = "api_key"
  region           = var.tenant_a_region
}

provider "oci" {
  alias        = "tenant_b"
  tenancy_ocid = var.tenant_b_ocid
  auth_type    = "api_key"
}

CI/CD Integration

A robust pipeline workflow fundamentally passes through the phases Validation, Planning, Approval, Apply and Verification, thus forming the foundation for reproducible and auditable deployments in multi-tenant environments.

Dynamic Tenant Configuration

Tenant information can be dynamically read from various sources. These are typically databases or directories, but they can also be simple YAML constructs or JSON. An example:

tenants:
  - name: customer_a
    environment: production
    region: eu-frankfurt-1
    approval_required: true
  - name: internal_test
    environment: development
    region: eu-amsterdam-1
    approval_required: false

Example Job for Policy Validation (GitLab)

If you do not use Sentinel and instead rely on an external tool, you can simply integrate such a tool as an additional step into the pipeline, for example the Python script in this case:

validate_policies:
  image: hashicorp/terraform:1.7
  stage: validate
  script:
    - terraform init
    - terraform plan -out=tfplan
    - terraform show -json tfplan | gzip > tfplan.json.gz
    - python scripts/validate_policies.py tfplan.json.gz
  artifacts:
    paths:
      - tfplan.json.gz

Self-Service Portals

A self-service portal with a curated module catalog allows business units to order standardized infrastructure with just a few clicks, while automated CI/CD pipelines ensure controlled provisioning in the background.

Making the use of a self-service portal mandatory also improves the overall compliance of the company, because only the portal itself then requires the permission to provision at the level of the application teams. Anything the self-service portal does not provide can no longer be implemented either, which further reduces the risk of shadow IT and makes it easier to implement zero trust, as the application teams themselves are no longer considered a trusted organization. In this way, you treat your internal end users like customers.

Important Features of Terraform Cloud and Enterprise

For complex infrastructures, there is no way around Terraform Cloud or Terraform Enterprise. If you value sole data sovereignty, which is the standard within the European Union, or if you are subject to regulatory security requirements, then Terraform Enterprise is the only viable option.

Workspace Management

Separate workspaces per tenant, as supported by Terraform Enterprise and Terraform Cloud (not to be confused with the CLI command terraform workspace of the free version of Terraform, because that is something entirely different), ensure strict isolation and at the same time enable granular versioning and release processes per tenant.

Tags and reusable variable sets significantly simplify configuration management and reduce duplication in large workspace landscapes.

Team and Permission Management

A finely graduated role matrix differentiates between read, write, and approval rights per workspace, supporting a clean separation of duties.

The straightforward SSO integration connects Terraform Cloud or Enterprise to the existing company identity system and accelerates onboarding and offboarding processes.

Run Triggers and Dependencies

Changes in a central workspace can automatically trigger plans in dependent workspaces, ensuring consistent upgrades across multiple levels.

Private Registry

The private registry handles the central versioning of all modules, automatically publishes their documentation, and creates a single point of truth for reusable infrastructure components.

Cost Estimation and Drift Detection

An integrated cost estimation already provides reliable infrastructure cost estimates per tenant before the apply phase, allowing for precise budget planning. Cloud providers usually offer tools for this purpose, but even if a cloud provider does not natively support it, you can implement it through custom pricing tags. The values of these tags can come from sources like CSV files or a database. When prices change, updates can easily be made in the infrastructure components because custom tags are usually updatable and non-destructive. The real challenge lies in volume-based rates like network traffic, meaning hidden costs such as those commonly encountered with AWS.

In parallel, drift detection continuously monitors deviations between the declared and actual state and immediately reports any anomalies before they become a problem. Drift detection is a feature of Terraform Enterprise and Terraform Cloud.

Auditing and Revision Security

Audit A scalable Terraform setup across multiple tenants not only requires structured responsibilities and seamless automation, but also comprehensive auditability of all changes to infrastructure resources.

Terraform Cloud and Terraform Enterprise offer an integrated audit log that documents all relevant events - such as triggering a plan, changes to variables, team assignments, or executing a terraform apply - chronologically, immutably, and exportably. These logs can be integrated into external SIEM systems if necessary, to meet security policies, legal requirements, or industry-specific compliance standards such as ISO 27001, BSI C5, or SOC 2.

For companies with particularly high traceability and revision security requirements, it is advisable to complement this with central log management based on solutions like Elastic Stack or Splunk. This allows not only activities at the Terraform level but also contextual information from GitLab, CI/CD pipelines, identity providers, and cloud APIs to be correlated and analyzed.

Mechanisms that link changesets with ticketing systems (e.g. Jira) and a clear process for change approvals are particularly helpful. They ensure that every change is reviewed and approved by at least one authorized person. This is a central component of separation of duties and least privilege strategies.

Conclusion

Scaling across organizational boundaries demands more than pure Terraform expertise. Those who combine clearly defined team structures, consistent policy checks, automated compliance processes, and modern CI/CD pipelines with the multi-tenancy capabilities of Terraform Cloud or Enterprise create the foundation for secure, reproducible, and efficient infrastructure that reliably functions even across many tenants.

Ralf Ramge

Founder, Cloud Architect & IT Consultant

Terraform @ Scale - Part 6b: Practical handling of nested modules

Terraform @ Scale - Part 6a: Understanding and Managing Nested Modules

Terraform @ Scale - Part 5b: API Gateways

Terraform @ Scale - Part 5a: Understanding API Limits

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

HashiCorp Vault Deep Dive – Part 2b: Practical Work with the Key/Value Secrets Engine

Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

Terraform @ Scale - Part 1e: Scaling Across Organizational Boundaries

Team Structures and Responsibilities

Platform Engineering Team

Service Teams

Application Teams

Trust Zones and State Isolation

Governance and Compliance

Policy-as-Code

Compliance Automation

Multi-Account Strategies

CI/CD Integration

Dynamic Tenant Configuration

Example Job for Policy Validation (GitLab)

Self-Service Portals

Important Features of Terraform Cloud and Enterprise

Workspace Management

Team and Permission Management

Run Triggers and Dependencies

Private Registry

Cost Estimation and Drift Detection

Auditing and Revision Security

Conclusion

Ralf Ramge

ICT.technology

Terraform @ Scale - Part 6b: Practical handling of nested modules

Terraform @ Scale - Part 6a: Understanding and Managing Nested Modules

Terraform @ Scale - Part 5b: API Gateways

Terraform @ Scale - Part 5a: Understanding API Limits

The Certificate Bomb is Ticking: The 200-day Deadline Threatens Your Business!

Terraform @ Scale - Part 4b: Best Practices for Scaling Data Sources

Terraform @ Scale - Part 4a: Data Sources are Dangerous!

Terraform @ Scale - Part 3c: Monitoring and Alerting for Blast Radius Events

HashiCorp Vault Deep Dive – Part 2b: Practical Work with the Key/Value Secrets Engine

Terraform @Scale - Part 3b: Blast Radius Recovery Strategies

Terraform @ Scale - Part 1e: Scaling Across Organizational Boundaries

Team Structures and Responsibilities

Platform Engineering Team

Service Teams

Application Teams

Trust Zones and State Isolation

Governance and Compliance

Policy-as-Code

Compliance Automation

Multi-Account Strategies

CI/CD Integration

Dynamic Tenant Configuration

Example Job for Policy Validation (GitLab)

Self-Service Portals

Important Features of Terraform Cloud and Enterprise

Workspace Management

Team and Permission Management

Run Triggers and Dependencies

Private Registry

Cost Estimation and Drift Detection

Auditing and Revision Security

Conclusion

Ralf Ramge

ICT.technology