Platform-specific setup guides:
- Windows users: See WINDOWS.md
- macOS users: See MACOS.md
- Linux users: See LINUX.md
- Go to Google Cloud Console and log in with your Google account.
- Note the project ID you'll be using for the workshop.
In your terminal, run the following commands:
# Log in to Google Cloud
gcloud auth login
# Set the workshop project (if not prompted during login)
gcloud config set project cloud-labs-workshop-42clws
# Set up application default credentials for Terraform
gcloud auth application-default login
# Verify your setup
gcloud config listBefore starting the workshop, run the setup verification script for your platform:
# Linux/macOS
./verify-setup.sh
# Windows PowerShell
.\verify-setup.ps1These scripts will check that you have all required tools installed and properly configured.
If you encounter issues during setup, consult your platform-specific guide for detailed troubleshooting steps.
All Terraform, gcloud, and git commands work identically across platforms. Throughout this workshop, "terminal" refers to any command-line interface (Terminal on macOS/Linux, Command Prompt/PowerShell on Windows, etc.).
This workshop is designed for intermediate to advanced Terraform users. You'll learn advanced concepts through hands-on exercises covering:
- State management fundamentals - Understanding and manipulating Terraform state
- Remote state backends - Setting up shared state storage and locking
- Modularization - Refactoring monolithic configurations into reusable modules
- Advanced validation - Using checks, validations, and type constraints
- Choose-your-own-adventure tracks - Deep dives into specialized topics
Prerequisites: Familiarity with basic Terraform concepts (resources, variables, outputs) and HCL syntax.
This part will focus on Terraform state fundamentals and basic state manipulation.
Terraform uses state to map resource configurations to real-world infrastructure. The state file (terraform.tfstate) is a JSON file that contains the current state of your infrastructure. It is essential for Terraform to function correctly, as it allows Terraform to track resources and their dependencies. It will not contain all resources, only the resources Terraform know about.
The state file can contain sensitive information, so it should be stored securely. Terraform 1.10 introduced ephemerality, with the special ephemeral block, that solves this issue partially by not storing secrets in state. You can read more about it in the docs.
Let's start with a simple example of a monolithic configuration. We've provided a starter configuration in infra/main.tf that creates a couple of networks, DNS records and service accounts.
- Create a new file
infra/terraform.auto.tfvarswith the following content:
name_prefix = "<your-unique-prefix>"
project_id = "cloud-labs-workshop-42clws"Your name_prefix should be unique to avoid collisions with other participants. If you run the workshop in your own project, replace the project_id accordingly.
- Run
terraform initandterraform applyin theinfra/folder to provision the resources.
The previous commands will have created a .terraform/ directory, a .terraform.lock.hcl and a terraform.tfstate file. .terraform.lock.hcl is the only file that should be committed. .terraform/ contain downloaded providers and modules.
-
Open
terraform.tfstatein your favorite text editor. You will see a JSON file with the current state of your infrastructure. Theresourceskey contains a list of all resources managed by Terraform, along with their attributes and metadata. -
terraform.tfstatecontains many attributes that are not relevant. Runterraform state listto see a list of all resources managed by Terraform. You will see a list of resources with their addresses, such asgoogle_compute_network.vpcandgoogle_dns_record_set.records[2]. -
Run
terraform state show google_compute_network.vpcto see the details of thegoogle_compute_network.vpcresource. You will see a list of all attributes and their values, including theid,name,auto_create_subnetworks, andproject. -
Run
terraform showto display the attributes of all Terraform-managed resources.
Terraform has two mechanisms for changing the address of a resource. terraform state mv and the moved block. The first one act on the state file directly, and the second is most commonly used when renaming resources in the configuration.
-
Run
terraform state mv google_compute_network.vpc google_compute_network.vpc_renamedto rename thegoogle_compute_network.vpcresource togoogle_compute_network.vpc_renamed. This will update the state file, but not the configuration. -
Run
terraform planto examine how Terraform wishes to fix the configuration drift between the state file and the configuration file. Note especially how this affects dependant resources, like the subnets and the DNS zone. Do not apply this configuration! -
Let's fix the state file with a
movedblock:moved { from = google_compute_network.vpc_renamed to = google_compute_network.vpc }
Examine the output of
terraform planagain. It should show the change to the state file, but also say "Plan: 0 to add, 0 to change, 0 to destroy" since this involves no actual changes. -
Apply the configuration (using
terraform apply) before continuing. Then remove themovedblock from the configuration. -
The
movedblocks andterraform statecommands also work on maps and lists of objects, such asgoogle_compute_subnetwork.subnets(which has 3 resources). We'll do the same operation, but this time we'll start with themovedblock.Start by changing the resource name in the configuration from
subnetstosubnets_renamedinmain.tf:resource "google_compute_subnetwork" "subnets_renamed" { # ... existing configuration }
Then create a
movedblock that handles the rename in state:moved { from = google_compute_subnetwork.subnets to = google_compute_subnetwork.subnets_renamed }
Run
terraform planto verify that there are no infrastructure changes, only state address changes. -
Apply the configuration before reverting the resource name back to
subnetsin the configuration file. Remove themovedblock. -
After reverting and removing the
movedblock, runningterraform planshould show 3 resources to be added and 3 to be destroyed. Useterraform state mv google_compute_subnetwork.subnets_renamed google_compute_subnetwork.subnetsto fix the state file, and runterraform planto verify that there are no changes.
You can read more about the available state manipulation commands.
"Drift" is when the defined state (or code) differs from the actual state of the infrastructure. This can happen for various reasons, such as manual changes made in the cloud provider's console, code changes not properly applied in an environment or changes made by other tools or scripts.
In order to handle drift, Terraform always executes a "refresh" before plan or apply operations to update the state file with real-world status. It will then reconcile the tracked resources in the state file with the actual status.
You can trigger a refresh manually with terraform refresh or using the -refresh-only flag with terraform plan and terraform apply. Refreshing the state is normally not necessary when running terraform plan or terraform apply, but can be useuful in special situations.
-
Next, go into the Google Cloud Console, and search for "VPC networks" in the top middle search bar. Click on the "VPC networks" link and find your VPC in the list. Click "Subnets" in the menu bar, and delete one of the subnets listed.
-
Let's see the effect of refreshing the state.
- Run
terraform plan -refresh=false. You should not see any changes to be applied since the state file and configuration files are in sync. - Run
terraform plan -refresh-only -out="plan.tfplan". You can see from the output which resources Terraform refreshes the status of. Terraform will generate a plan to update the state file. - Run
terraform apply plan.tfplan. To apply the changes to the state file. - (Optional) You can compare the state file by looking at the difference between of
terraform.tfstateandterraform.tfstate.backup. - Run
terraform plan -refresh=falseagain. Terraform will now detect a difference between the state file and the configuration. Terraform will show the plan to change the real-world state back to the desired state decided by the configuration.
- Run
-
Run
terraform applyand apply the configuration to get the infrastructure back to the desired state.
Note
terraform apply -refresh-only will give the option to update the state file without generating an intermediate state file (generally, all arguments given to terraform plan can also be given to terraform apply).
terraform refresh can be used to refresh and apply the state to the state file directly without reviewing it. This is most similar to what is actually done by terraform plan before generating the actual plan.
Our single configuration file is quite long and unmanageable. We can split it into multiple files to improve organization. As long as we keep the resources in files within the same directory, with the same resource name, Terraform will consider them the same. Moving files to different directories will create modules. For now, we will just create a logical split of files to simplify later refactoring.
Terraform has a style guide, which contains a section about file names. We will follow this style guide to reorganize our code.
-
Look up the file names section in the style guide.
-
Move all the non-resource blocks (variables, providers, etc.) into their respective files.
-
You should now have files named
terraform.tf,providers.tf,variables.tfandmain.tf. -
The remaining resources in
main.tfcan split into files based on their function (e.g.,dns.tf,network.tf, etc.), if you want to. -
When you're done, run
terraform planto verify that there are no changes to the configuration.
By default, Terraform stores the state locally in a terraform.tfstate file. When using Terraform in a team it is important for everyone to be working with the same state so that operations will be applied to the same remote objects.
With remote state, Terraform writes the state data to a remote data store, which can be shared between all members of a team. Remote state can be implemented by storing state in Amazon S3, Azure Blob Storage, Google Cloud Storage and more. Terraform configures the remote storage with a backend block.
-
Run the following command to create a GCS bucket:
gcloud storage buckets create gs://<bucket_name> --project=cloud-labs-workshop-42clws --location=europe-west1
<bucket_name>can be any globally unique string, we recommend<your_prefix>_state_storage_<random_string>. The<random_string>should be 4-6 random lower case letters or numbers.-
Update the Terraform configuration to use the provisioned bucket as a backend:
terraform { backend "gcs" { bucket = "<bucket_name>" prefix = "terraform/state" } }
-
Run
terraform initand migrate the state to the bucket. -
Verify that the state is located in storage bucket (
<prefix>is defined asterraform/stateabove):gcloud storage ls gs://<bucket_name>/terraform/state
-
If you want to view the contents of the state file run
gcloud storage cat <path_to_state_file>. The path to the state file is found in the output of the previous command.
-
As long as the backend supports state locking, Terraform will lock your state for all operations that could write state. This will prevent others from acquiring the lock and potentially corrupting your state. Since GCS supports state locking, this happens automatically on all operations that could write state. This is especially important when working in a team or when automated workflows (such as CI/CD pipelines) may run Terraform simultaneously, as it ensures only one operation can modify the state at a time.
-
State lock can be verified by:
- Try changing the
google_dns_managed_zone.private_zoneresource name and runterraform applybut leave it on approval prompt and then, in another terminal window, runterraform plan. You should see that the state file is locked by theterraform applyoperation.
- Try changing the
-
Also see GCS remote backend docs for more info.
Terraform modules improve code reuse, organization and readability. Modules can be used to create reusable components in a repository or create a library of reusable components shared between teams.
We'll start with a network module that is responsible for creating both the VPC and the subnets. We'll also have to take care to not modify the existing resources, and will use the moved block to avoid actual changes to the infrastructure.
Modules are defined in their own directory, and can be used by referencing the module's source. The module's source can be a local path, a Git repository or a Terraform registry. It's common to gather modules at the repository root in the modules/ folder.
-
Create the
modules/networkdirectory at the repository root. -
Create
modules/network/main.tfand move thegoogle_compute_networkandgoogle_compute_subnetworkresources into it. -
We'll need to pass variable definitions to the module. Modules follow the same naming conventions Create
modules/network/variables.tfand copy the required variables fromvariables.tfinto it (name_prefix,regions,subnet_cidrsfromvariables.tf). Remove the variable defaults, if any. -
Other resources need to reference the VPC id, so we'll need a output. Create
modules/network/outputs.tfwith the following content:output "vpc_id" { description = "The ID of the VPC" value = google_compute_network.vpc.id }
-
Add a
moduleblock to replace the previous network configuration file to call the module:module "network" { source = "../modules/network" name_prefix = var.name_prefix regions = var.regions subnet_cidrs = var.subnet_cidrs }
And update
google_dns_managed_zone.private_zoneto refer to the module outputvpc_idin thenetwork_urlargument:network_url = module.network.vpc_id
-
Run
terraform initand thenterraform planto verify that the changes are syntactically correct. Fix errors before continuing. Note that the plan will show changes to the infrastructure! But, can do this without changes to the infrastructure by using themovedblock. -
When moving between modules, the
movedblock must be in the module you moved from (in this case the root module). Add thismovedblock in the same file as the module declaration:moved { from = google_compute_network.vpc to = module.network.google_compute_network.vpc } moved { # Note: We move all three subnets in the list at once from = google_compute_subnetwork.subnets to = module.network.google_compute_subnetwork.subnets }
Run
terraform planagain and verify that there are no changes, except moving resources. -
Apply the moves with
terraform apply. This will move the resources in the state file without changing the infrastructure. Runterraform planand see that the moves are no longer planned actions. -
Delete the
movedblock from the configuration file.
Caution
Removing moved blocks in shared modules can cause breaking changes to consumers that haven't applied the move actions yet. This is not a problem here since we're the only consumer of the module. Read more in the docs.
The design of the network module can be improved, we'll get back to ways to do that in the extra tasks section later in the workshop.
For the DNS configuration, we'll only create a module for creating a single DNS A record, leaving the DNS zone in the root module.
-
Following similar steps to creating the
networkmodule, createdns_a_recordmodule. This module should have threestringvariable inputsname,zone_nameandipv4_address, and create a singlegoogle_dns_record_setresource namedrecord. You will need to modify the resource to use the new variables:resource "google_dns_record_set" "record" { name = var.name type = "A" ttl = 300 managed_zone = var.zone_name rrdatas = [var.ipv4_address] }
-
We can then use a loop with the
countmeta-argument in the root module when we call the module:module "records" { count = length(var.dns_records) source = "../modules/dns_a_record" name = "${var.dns_records[count.index]}.${var.name_prefix}.workshop.internal." zone_name = google_dns_managed_zone.private_zone.name ipv4_address = "10.0.0.${10 + count.index}" }
-
When refactoring resources that use looping this way, we need to use a moved block per resource:
moved { from = google_dns_record_set.records[0] to = module.records[0].google_dns_record_set.record } moved { from = google_dns_record_set.records[1] to = module.records[1].google_dns_record_set.record } moved { from = google_dns_record_set.records[2] to = module.records[2].google_dns_record_set.record }
-
Verify that the only actions are to move state, and apply the changes before removing the
movedblocks.
The service_account module is a bit different, since it creates multiple resources using loops. The logic could be greatly simplified if we designed to module to create a service account, and assign it a set of roles.
The service_account module should have variables account_id, display_name, description, project_id and roles. I.e, we want a module that can replace the current looping logic with a module call similar to this:
module "service_accounts" {
count = length(var.service_accounts)
source = "../modules/service_account"
account_id = "${var.name_prefix}-${var.service_accounts[count.index]}"
display_name = "<removed for brevity>"
description = "<removed for brevity>"
project_id = var.project_id
roles = var.project_roles
}- Create the
service_accountmodule inmodules/service_accountand use it. Note: This refactoring is not required to complete future tasks, and feel free to skip it or come back to it later if you want to.
Managing multiple environments (e.g., development, staging, production) is a core requirement for infrastructure as code. This allows you to test changes in a safe environment before applying them to production. In this track, we will explore two common strategies for managing environments in Terraform: Directory-based separation and Workspaces.
| Feature | Directory-based environments | Terraform workspaces |
|---|---|---|
| Clarity | High: Directory path clearly indicates environment. | Low: Must check CLI status to know active workspace. |
| Isolation | High: Completely separate state files and configurations. | Medium: Shared backend config, separate state paths. |
| Flexibility | High: Easy to vary config per environment (e.g. instance sizes). | Low: Config must be identical; variations via variables/lookups. |
| Simplicity | Low: Managing multiple directories can lead to duplication. | High: Single set of configuration files. |
| Risk | Low: Harder to apply to wrong env. | High: Easy to forget to switch workspace. |
| Drift | Environments can diverge if changes aren't propagated. | Guarantees consistent infrastructure definition. |
| Use case | Best for teams needing clear separation and flexibility. | Best for small teams or simple environments. |
This approach involves creating separate directories for each environment (e.g., envs/dev, envs/prod). Each directory contains the Terraform configuration for that specific environment and maintains its own state file.
Task: Implement Directory-based Environments
- Create a new directory structure
envs/devandenvs/prodin the root of your repository. - Copy the contents of your existing
infra/directory into bothenvs/devandenvs/prod. - Set the
name_prefixvariable differently for each:- In
envs/dev: Kepp your existingname_prefix = "<your-prefix>". - In
envs/prod: Set a new prefix, e.g,name_prefix = "<your-prefix>prod".
- In
- In
envs/prod, adjust thebackendblock to set a newprefix. E.g.prefix = "terraform/prod/state".envs/devandenvs/prodwill otherwise use the same state file. - Initialize and plan both environments. You might have to run
terraform init -reconfigureinenvs/prodif you copied the.terraform/directory. Firstenvs/devand verify that there's no changes. Thenenvs/prodand verify that all resources will be created without destroying the dev resources. Apply the changes in prod. - Verify in the GCP console that both environments have their own set of resources. Also verify that you can find both state files in your storage bucket.
- Destroy the
envs/prodresources when done testing usingterraform destroy.
Terraform Workspaces allow you to manage multiple state files associated with a single configuration directory. You switch between workspaces using the CLI.
Task: Implement Workspaces
- Go to your original
infra/directory. - List current workspaces:
terraform workspace list(you should seedefault). - Create a new workspace for production:
terraform workspace new prod. - Rename the
terraform.auto.tfvarsfile todefault.tfvars. Create a newprod.tfvarsby copying the content indefault.tfvars. Change the prefix, e.g.,name_prefix = "<your-prefix>prod". - Switch between workspaces:
terraform workspace select default(let's treat default as dev) orterraform workspace select prod. - Run
terraform planin theprodworkspace. Notice how Terraform plans to create a new set of resources because the state is empty for this workspace. - Find the state file in your storage bucket. It should be located at
terraform/state/prod.tfstate. - Destroy the
prodworkspace resources when done testing usingterraform workspace select prodfollowed byterraform destroy. Then you can delete the workspace withterraform workspace delete prod.
Both approaches have their pros and cons. Directory-based environments provide better isolation and flexibility, while workspaces offer simplicity and ease of management. Choose the approach that best fits your team's workflow and project requirements.
Note how we used backend prefixes and variable files differently in both approaches to separate environments, and how the state files are stored differently in the remote backend. Both approaches are useful, and you can even combine them for more complex scenarios.
Terraform Cloud is a managed service provided by HashiCorp for running Terraform workflows in a collaborative and secure environment. It helps teams manage infrastructure as code at scale by handling Terraform execution, state storage, access control, version control integration and policy enforcement — all in the cloud. In this part we will explore how to store state in Terraform Cloud and run automatic plan on PR.
Go to Terraform Cloud and create a free account. Once signed in, create an organization to store your infrastructure. HCP Terraform organizes your infrastructure resources by workspaces in an organization.
A workspace in HCP Terraform contains infrastructure resources, variables, state data, and run history. HCP Terraform offers a VCS-driven workflow that automatically triggers runs based on changes to your VCS repositories. The VCS-driven workflow enables collaboration within teams by establishing your shared repositories as the source of truth for infrastructure configuration. Complete the following steps to create a workspace:
- After selecting an organization, click New and choose Workspace from the dropdown-menu.
- Choose a project to create the workspace in, and click create.
- Configure the backend to let Terraform Cloud manage your state:
terraform {
cloud {
hostname = "app.terraform.io"
organization = "<your-organization-name>"
workspaces {
name = "<workspace-name>"
}
}
}Initialize the state with terraform init.
In order to trigger HCP Terraform runs from changes to VCS, you first need to create a new repository in your personal GitHub account.
In the GitHub UI, create a new repository. Name the repository learn-terraform, then leave the rest of the options blank and click Create repository.
Copy the remote endpoint URL for your new repository.
In the directory of your source code, update the remote endpoint URL for your repository to the one you just copied. git remote set-url origin YOUR_REMOTE, Add your changes, commit and push to your personal repository.
To connect your workspace with your new GitHub repository, follow the steps below:
- In your workspace, click on VCS workflow and choose an existing version control provider from the list or configure a new system. If you choose Github App,choose an organization and repository when prompted. You can choose your own private repositories by clicking on add_another_organization and selecting your github account.
- Under advanced options, set the Terraform Working Directory to infra and click Create.
Terraform Cloud also needs access to Google Cloud resources to be able to run necessary changes. We therefore need to add a workspace variable called GOOGLE_CREDENTIALS containing a service account key.
- Go to Google Cloud -> IAM & ADMIN -> Service Accounts and locate the terraform cloud service account (terraform-cloud-sa-clws@cloud-labs-workshop-42clws.iam.gserviceaccount.com)
- Under Actions, click on Manage keys and choose create new key under the Add key dropdown
- Head over to Terraform Cloud and under your workspace variables, add a variable named GOOGLE_CREDENTIALS with the service account key as value.
You should now be able to automate terraform plan on PR.
- Creating workflow files for Terraform
- Implementing secure credential handling for GCP
- Options:
- GitHub Actions setup
- Cloud Build integration
- Pull request automation (plan on PR, apply on merge)
- Hands-on exercises with:
countvs.for_eachforexpressionsflattenand other collection functions- Practical GCP examples (e.g., managing multiple GKE node pools)
- Performance considerations
- Use cases for dynamic blocks
- Lab: Implementing complex GCP resource configurations with dynamic blocks
- Discussion: Performance implications and maintainability trade-offs
- Best practices and anti-patterns
- Unit testing with Terratest
- Policy validation with OPA/Conftest
- Static analysis and linting
- Implementing pre-commit hooks
- GCP-specific compliance checks
Terraform has a type system, covering basic functionality. It allows for type constraints in the configuration. Additionally, the language has support for different types of custom conditions to validate assumptions and provide error messages.
The network module has two list(string) input variables, regions and subnet_cidrs, that are expected to be of the same length. Let's look at different ways of verifying this. First, let's get an introduction to variable validation.
-
The
subnet_cidrsis of typelist(string), we would like to validate that the ranges specified are valid. We can use avalidationblock inside ourvariabledeclaration. This would look like this:variable "subnet_cidrs" { description = "CIDR ranges for subnets" type = list(string) validation { condition = alltrue([for cidr in var.subnet_cidrs : can(cidrhost(cidr, 255))]) error_message = "At least one of the specified subnets were too small, or one of the CIDR range was invalid. The subnets needs to contain at least 256 IP addresses (/24 or larger)." } }
We added the
validationblock, the rest should be like before. Let's explain what's going on here:- The
conditionis a boolean expression that must be true for the validation to pass. alltrueis a function that returns true if all elements in the list are true.foris a for expression that iterates over the list of CIDR ranges and checks if each one is valid using thecidrhostfunction.- We can refer to the variable being validated with the same syntax as before:
var.subnet_cidrs. - The
canfunction is a special function that returns true if the expression can be evaluated without errors. I.e., ifcidrhostreturns an error due to an invalid or to small IP address range,canwill return false. - If
conditionevaluates to false, the plan fails and theerror_messagewill be printed.
Add the validation try to provoke a validation error by specifying a smaller IP address range (e.g., a
/25) or specifying an invalid IP address. Make sure the code works again before moving to the next step. - The
Note
For the following steps we will write the same code in different ways, so you might want to commit (or make a copy) of your code, in order to be able to revert later. If you've already done the tasks in Track D you might want to revert your changes to the network module.
-
Depending on your use case, the best way might be to combine the variables into a structured type containing a list of objects with the properties
regionandcidr(the type definition would belist(object({ region = string, cidr = string}))). This would use the type system to ensure the assumptions are always correct. In Track D we cover this refactoring approach, and will not repeat it here. -
Terraform 1.9.0 (released June, 2024) introduced general expressions and cross-object references in input variable validations. This lets us refer to different variables, locals and more during validation.
a. Add a new validation to either the
regionsor thesubnet_cidrsvariable to ensure that the two lists are of equal lengths. The condition should belength(var.regions) == length(var.subnet_cidrs). b. Verify that the validation fails if the number of regions and CIDRs are not the same. -
For the purposes of this workshop, we can do a different refactoring to illustrate multiple validation blocks: Let's combine the
regionsandsubnet_cidrsvariables into asubnetsvariable with typeobject({ regions = list(string), cidrs = list(string) }). a. Remove the validation from the previous step, and modify the validation from the first step to work with the new variable type. Also update the module and the calling code to work with the new variable type definition. Make sureterraform plansucceeds without modifications. b. Add a second validation block to thesubnetsvariable that verifies that the two lists have the same length. Add an appropriate error message. c. Verify that the validation fails if the number of regions and CIDRs are not the same.
Checks lets you use custom conditions that will execute on every plan or apply operation. Failing checks will therefore not, however, have any effect on Terraform's execution of operations.
-
Checks are very flexible. We can write the input validations in the previous step as assertions. Transform the validation of subnet CIDRs into a check. The general syntax of a check is:
check "check_name" { // Optional data block // At least one assertion block assert { condition = 1 == 1 // condition boolean expression error_message = "error message" } }
-
You can add data blocks in the checks to verify that a property of some resource outside the configuration or the scope of the module is correct. E.g., validate assumptions on VMs or clusters, check that resources are securely configured, or check that website responds with 200 OK after Terraform has run using the
httpprovider.In the
dns_a_recordmodule. Write a check that uses thegoogle_dns_managed_zonedata source and verifies thatvisibilityis"private".Apply the changes to the configuration.
-
To see the check from the previous step, you can run
terraform destroyto remove the resources and then apply the configuration again. Note how it gives you a warning during theplanstep, since the managed zone does not exist yet. Terraform will still provision the DNS records however, independent of the state of the checks.