Setup¶
The effective folder structure:
.
├── terraform/
│ └── envs/ # Environment-specific configurations
│ ├── dev/
│ │ ├── main.tf # Calls upstream modules (VPC, RDS, ECS, etc.)
│ │ ├── backend.tf # Remote backend config (state isolation)
│ │ ├── variables.tf # Variable definitions for this env
│ │ ├── providers.tf # Provider config (AWS,GCP,AZURE) per env
│ │ └── dev.tfvars # Only immutable, non-sensitive defaults
│ ├── staging/ # Mirrors dev; used for integration tests
│ └── prod/ # Mirrors staging; stricter quotas and policies
│
├── versions.tf # Locks Terraform & provider versions
├── .tflint.hcl # Linting rules
├── .checkov.yml # Static security analysis rules
├── .infracost.yml # Cost estimation config
├── .pre-commit-config.yaml # Developer hygiene hooks
├── .trivyignore # Ignore rules for container scans
└── README.md # Explains repo usage and structure
2. Modules in practice¶
2.1 Direct call pattern (envs/prod/main.tf
abridged)¶
# VPC
module "vpc" {
source = "terraform-aws-modules/vpc/aws" # Official module
version = "~> 6.0" # Pin MAJOR, allow patch/minor upgrades
name = "${var.service}-vpc"
cidr = var.cidr_block
# Take first 3 AZs to spread resources
azs = slice(data.aws_availability_zones.available.names, 0, 3)
public_subnets = var.public_subnets
private_subnets = var.private_subnets
enable_nat_gateway = true
tags = local.tags
}
# RDS
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "~> 6.12"
identifier = "${var.service}-db"
engine = "postgres"
engine_version = "16.2"
instance_class = "db.t4g.medium"
subnet_ids = module.vpc.private_subnets
multi_az = true # Highly available
backup_retention_period = 7
deletion_protection = true # Safety guard
username = var.db_user
password = data.aws_secretsmanager_secret_version.db_pwd.secret_string
tags = local.tags
}
# ECS
module "ecs" {
source = "terraform-aws-modules/ecs/aws"
version = "~> 5.13"
cluster_name = "${var.service}-${var.environment}"
capacity_providers = ["FARGATE", "FARGATE_SPOT"]
services = {
web = {
cpu = 512
memory = 1024
desired_count = 2
# Container definition encoded as JSON
container_definitions = jsonencode([{
name = "web"
image = "${var.image}@${var.image_digest}"
portMappings = [{ containerPort = 8080 }]
secrets = [
{ name = "DB_URL", valueFrom = module.db.secret_arn }
]
environment = [
{ name = "SPRING_PROFILES_ACTIVE", value = var.environment }
]
}])
load_balancer = {
target_group_arn = module.ecs.alb_target_group_arns["web"]
}
}
}
tags = local.tags
}
Rationale:
- Call upstream modules directly and pin versions.
- Inject secrets via ARN, never plaintext
- Use locals for tagging consistency.
Rule of thumb: wrap if you need mandatory tags or guard-rails; never fork – you’ll lose the upstream upgrades.
3. Remote backend recipes¶
Terraform Cloud (simplest)¶
Pros: simplest setup, state encryption managed. Cons: paid tier may be needed for policies.
S3 + DynamoDB (fully OSS)¶
terraform {
backend "s3" {
bucket = "acme-terraform-state" # must exist beforehand
key = "prod/terraform.tfstate"
region = "eu-central-1"
dynamodb_table = "terraform-locks" # provides locking
encrypt = true # SSE-KMS recommended
}
}
4. versions.tf
¶
terraform {
required_version = "~> 1.9" # June 2025 LTS
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 6.0"
}
}
}
Commit the generated terraform.lock.hcl
so CI is deterministic.
5. Security defaults¶
Layer | Setting |
---|---|
IAM | GitHub OIDC ⇒ short-lived role; no static keys on laptops |
State | SSE-KMS on S3 / workspace-level encryption on TF Cloud |
Transit | TLS 1.2+ only (aws_lb_listener default) |
Data | RDS storage + backups encrypted • automatic minor-version upgrades |
Secrets | Stored in AWS Secrets Manager; injected by ARN, never in plain vars |
Static analysis | Pre-commit hook: terraform fmt → tflint → tfsec → checkov |
Policy as code | Sentinel / OPA rules (e.g. deny public S3) – add when compliance rises |
6. GitHub Actions pipeline (.github/workflows/terraform.yml
)¶
name: Terraform
on:
pull_request:
paths: ["terraform/**"]
permissions: { id-token: write, contents: read }
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform fmt -check -recursive
- run: terraform init -backend-config="envs/${{ github.head_ref }}/backend.hcl"
- run: terraform validate
plan:
needs: validate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init -backend-config="envs/${{ github.head_ref }}/backend.hcl"
- run: terraform plan -out=tfplan
- uses: actions/upload-artifact@v4
with: { name: tfplan, path: tfplan }
apply:
if: github.event.pull_request.merged == true
environment: prod # requires manual approval gate
needs: plan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- run: terraform init -backend-config="envs/prod/backend.hcl"
- run: terraform apply -auto-approve tfplan
- OIDC → AWS role assumption. No long-lived keys.
- Manual approval enforced by GitHub environments.
- Plans are generated on PR, applied only after merge.
7. Local developer ergonomics¶
make deploy ENV=dev
– wrapsterraform init
/plan
/apply
.- Direnv – autoloads environment variables (
AWS_PROFILE
, region). - .pre-commit-config.yaml – keeps style & security on every commit:
repos:
- repo: https://github.com/antonbabenko/pre-commit-terraform
rev: v1.79.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: tflint
- id: tfsec
- id: checkov
8. Cost & drift guard-rails¶
Tool | Schedule | Action |
---|---|---|
Infracost | On every PR | Shows monthly delta in comment |
terraform plan | Nightly GitHub Action | Exit code 2 ⇒ Slack/Email alert |
AWS Budgets | 80 % of monthly limit | Sends SNS → Slack |
9. Scaling & failure-proofing checklist¶
[] Multi-AZ: ALBs, RDS, subnets across ≥ 2 AZs.
[] Auto-scaling: ECS services with Fargate Spot + CPU/latency metrics.
[] Backups: RDS retention 7 days, cross-region snapshot copy.
[] Observability: CloudWatch dashboards/alarms. Add Prometheus/Grafana if needed.
[] Deployments: ECS rolling updates. Optionally add CodeDeploy for blue-green.
10. Upgrade workflow¶
-
Review release notes of each module.
-
Bump version pin in branch.
-
Run terraform init -upgrade in dev.
-
Validate plan.
-
Merge → pipeline applies to staging and prod with gate.
(You usually only do this quarterly; the community publishes clear guides.)
11. When this template stops being enough¶
Scenario | Next step |
---|---|
Multiple squads or > 3 prod environments | Introduce Terragrunt or a platform (Spacelift/Scalr) for DRY-er roots |
Heavy compliance (PCI/HIPAA, audit trails) | Add OPA/Sentinel CI gates, Service Control Policies, AWS Control Tower |
Multi-cloud or edge workloads | Look at CDK for Terraform or Crossplane to abstract vendors |
TL;DR printable checklist¶
- One mono-repo with
envs/
for each environment - Remote state secured & versioned
- Use
terraform-aws-modules/*
directly, pinned to major versions - OIDC → no static AWS keys
- Pre-commit: fmt, validate, tflint, tfsec, checkov
- GitHub Actions plan on PR, apply on merge with manual prod gate
- Encrypt & tag everything; secrets via Secrets Manager
- Nightly drift plan + Infracost + Budget alerts
Using the Template in dev/prod¶
Each environment is isolated in its own directory under envs/
. You always cd
into that environment’s directory before running Terraform. This ensures state, variables, and backends remain separate.
1. Backend configuration¶
envs/dev/backend.tf
¶
terraform {
backend "s3" {
bucket = "acme-terraform-state"
key = "dev/terraform.tfstate" # path unique to dev
region = "eu-central-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
envs/prod/backend.tf
¶
terraform {
backend "s3" {
bucket = "acme-terraform-state"
key = "prod/terraform.tfstate" # path unique to prod
region = "eu-central-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
2. Main module calls¶
Both main.tf
files are structurally the same. They call upstream modules but parameterise with environment-specific variables.
envs/dev/main.tf
¶
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 6.0"
name = "dev-vpc"
cidr = var.cidr_block
azs = slice(data.aws_availability_zones.available.names, 0, 2)
public_subnets = var.public_subnets
private_subnets = var.private_subnets
enable_nat_gateway = false # cheaper for dev
tags = local.tags
}
envs/prod/main.tf
¶
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 6.0"
name = "prod-vpc"
cidr = var.cidr_block
azs = slice(data.aws_availability_zones.available.names, 0, 3)
public_subnets = var.public_subnets
private_subnets = var.private_subnets
enable_nat_gateway = true # HA and scaling for prod
tags = local.tags
}
3. Variables¶
envs/dev/dev.tfvars
¶
service = "webapp"
environment = "dev"
cidr_block = "10.0.0.0/16"
public_subnets = ["10.0.1.0/24"]
private_subnets= ["10.0.101.0/24"]
db_user = "devuser"
envs/prod/prod.tfvars
¶
service = "webapp"
environment = "prod"
cidr_block = "10.1.0.0/16"
public_subnets = ["10.1.1.0/24", "10.1.2.0/24", "10.1.3.0/24"]
private_subnets= ["10.1.101.0/24", "10.1.102.0/24", "10.1.103.0/24"]
db_user = "produser"
Secrets (passwords, keys) must not be stored in *.tfvars
. Use Secrets Manager or SSM.
4. Commands¶
For development¶
For production¶
cd envs/prod
terraform init
terraform plan -var-file=prod.tfvars
terraform apply -var-file=prod.tfvars
Key notes¶
- Each
backend.tf
ensures state isolation per environment. - Each
*.tfvars
file holds only immutable config (CIDRs, naming, sizes), never secrets. - Resource behaviour differs per environment (e.g. NAT disabled in dev, enabled in prod).
- CI/CD can switch
cd envs/<env>
based on branch or workflow input.
Secrets¶
Secrets are never checked into the repo. Terraform only references them at runtime from a managed secret store. Provisioning works by:
1. Store the secret¶
Example: database password in AWS Secrets Manager¶
aws secretsmanager create-secret \
--name db/webapp/prod/password \
--secret-string 'SuperStrongPassword123!'
or for SSM Parameter Store (encrypted with KMS):
aws ssm put-parameter \
--name /db/webapp/prod/password \
--value 'SuperStrongPassword123!' \
--type SecureString
2. Reference secret in Terraform¶
Using Secrets Manager¶
# Fetch the latest version of the secret
data "aws_secretsmanager_secret_version" "db_pwd" {
secret_id = "db/webapp/${var.environment}/password"
}
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "~> 6.12"
identifier = "${var.service}-db"
engine = "postgres"
username = var.db_user
password = data.aws_secretsmanager_secret_version.db_pwd.secret_string
# other RDS inputs...
}
Using SSM Parameter Store¶
data "aws_ssm_parameter" "db_pwd" {
name = "/db/webapp/${var.environment}/password"
with_decryption = true
}
module "db" {
source = "terraform-aws-modules/rds/aws"
version = "~> 6.12"
identifier = "${var.service}-db"
username = var.db_user
password = data.aws_ssm_parameter.db_pwd.value
}
3. Access control¶
- Terraform execution role (from GitHub Actions OIDC or developer IAM role) must have read-only permission for that secret. Example IAM policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"ssm:GetParameter"
],
"Resource": [
"arn:aws:secretsmanager:eu-central-1:123456789012:secret:db/webapp/*",
"arn:aws:ssm:eu-central-1:123456789012:parameter/db/webapp/*"
]
}
]
}
4. Provisioning flow¶
- Secrets created once manually or by a bootstrap script.
- Terraform plan/apply runs. It pulls the secret value dynamically from AWS.
- RDS or ECS module consumes it as input.
- The secret never enters Git history,
.tfvars
, or CI logs.
5. Runtime injection into workloads¶
Example with ECS:
container_definitions = jsonencode([{
name = "web"
image = var.image
secrets = [
{ name = "DB_PASSWORD", valueFrom = data.aws_secretsmanager_secret.db_pwd.arn }
]
}])
ECS injects the secret into the container environment at runtime.
Recommended “cookie-cutter” starting point¶
Pick → terramate-io/terramate-quickstart-aws
.
Among the public, maintained starters it is the only one that already folds in all the must-haves we listed earlier while still staying light enough for one engineer.
Built-in item | Status in template |
---|---|
Remote state (S3 + DynamoDB) | Pre-wired |
OIDC GitHub → AWS | Pre-wired |
CI/CD GitOps pipeline | GitHub Actions plans → PR approval → auto-apply |
Drift detection | Scheduled GitHub Action + Slack/PR comment |
Multi-env / multi-account | stg + prd stacks out of the box |
Pre-commit (tflint , tfsec , checkov ) | Included |
Scaffolded VPC, NAT, LB, compute layer | VPC + sample EKS; easy to swap to ECS/Lambda |
Change-detection orchestration | Terramate change-sets (so plans only run where code changed) |
Terraform/OpenTofu compatible | Yes |
Setting Things up¶
The following is the reference arhitecture that we will be deploying with Terraform:
We First create an IAM user and configure the aws cli:
We are using terraform cloud backend.
We initialise the terraform backend by stating the terraform {}
block in the backend.tf
.
Then we terraform login
and create a token to be used on the cli.
Since we are using Terraform Cloud:
We use Terraform Cloud → AWS OIDC dynamic credentials. Steps:
Create the OIDC identity provider in AWS¶
- Open IAM → Identity providers → Add provider → OpenID Connect.
- Set Provider URL to
https://app.terraform.io
without trailing slash. - Set Audience to
aws.workload.identity
. - Save and note the provider ARN.
Create the IAM role the workspace will assume¶
- Create a new role → Web identity → Select the OIDC provider you created.
- Workspace Run → this is the correct option for standard Terraform Cloud workspaces.3. Attach these permissions policy for the AWS resources Terraform must manage. (For prod attach stricter policies):
This restricts the token to the exact organisation, project, and workspaces(dev, bootstrap, staging, prod => this is very importatnt). Use run_phase => *
if one role is shared by plan and apply. Create two roles if plan and apply need different permissions and set run_phase -> plan
or run_phase => apply
.
example trusted entities entry:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::xxxxx"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"app.terraform.io:aud": "aws.workload.identity"
},
"StringLike": {
"app.terraform.io:sub": [
"organization:devopssean:project:tf-demo:workspace:dev:run_phase:*",
"organization:devopssean:project:tf-demo:workspace:bootstrap:run_phase:*",
"organization:devopssean:project:tf-demo:workspace:prod:run_phase:*",
"organization:devopssean:project:tf-demo:workspace:staging:run_phase:*"
]
}
}
}
]
}
Add variables in the Terraform Cloud workspace¶
-
In Workspace → Variables, add environment variables:
-
TFC_AWS_PROVIDER_AUTH=true
TFC_AWS_RUN_ROLE_ARN=arn:aws:iam::...
- Optionally
AWS_REGION=<region>
if not set in the provider block. - Optionally
TFC_AWS_PLAN_ROLE_ARN
andTFC_AWS_APPLY_ROLE_ARN
if using separate roles. - Use a Variable Set if multiple workspaces share the role.
Warning
DOUBLE CHECK AND TRIPPLE CHECK that the variables are of Category Environment and NOT terraform.
Keep the provider block minimal in each environment¶
envs/dev/providers.tf
:
provider "aws" {
region = "eu-central-1" # or omit and set AWS_REGION in workspace vars
default_tags {
tags = {
environment = "dev"
service = var.service
}
}
}
No access keys. No profiles. Terraform Cloud injects short-lived credentials at run time.
Run the workspace¶
- Trigger a plan in Terraform Cloud.
- Confirm the run log shows dynamic credentials in use and no IMDS lookups.
- Apply after review.
Troubleshooting¶
-
“No valid credential sources found” means the workspace variables are missing or the trust policy does not match the token. Fix the variables or the
sub
conditions. -
“Invalid audience” means the trust policy expects a different
aud
. Setapp.terraform.io:aud
toaws.workload.identity
or match your custom audience. -
Access denied during plan or apply means the role’s permissions policy is too narrow. Add the exact actions Terraform needs and re-plan.
Security notes¶
-
Lock the trust policy to the specific organisation, project, and workspace to prevent cross-org token reuse.
-
Prefer separate roles for plan and apply to enforce read-only during plan if required.
-
Use IAM Access Analyzer to validate the trust policy.
-
Avoid static keys in Terraform Cloud entirely.
We then apply the configuration in the terraform/envs/dev
folder:
We then check what was provisioned in Terraform Cloud:
We then destroy the infrastructure:
Info
To select an AMI whilst using a ac2_instance module, search for an AMI string value on https://cloud-images.ubuntu.com/locator/ec2/ and then replace that in the ami
variable.
For the backend in development we will be using terraform cloud (free for up to 5 users). In production we will be using S3 with Dynamo DB.
Dynamo DB has state locking which prevents concurrent apply commands being executed at the same time.
In dev: Check the setup above
In prod: We first bootstrap the bucket and Dynamo DB table through Terraform then we continue with our normal setup. The provisioning files are in the terraform/envs/prod/bootstrap
directory.
Another way is to make use of the terraform-aws-modules/vpc/aws
module. This is the solution that we employed in this project.
We have to setup the bootstrap workspace in the terraform cloud as we did for dev
Then:
This creates the state bucket and DynamoDB locking table, along with anything else you have defined in the *.tf file(s). At this point, the Terraform state is still stored locally.
Module terraform_state_backend also creates a new prod-backend.tf
file (the name can be changed in the variables) that defines the S3+DB state backend.
Info
When running the bootstrap in Terraform Cloud, the local_file
resource from cloudposse/tfstate-backend
cannot persist files back into your Git repository. This is why you do not see a backend.tf
file after a successful apply. Instead of relying on local_file
, you should expose the S3 bucket, DynamoDB table, and region as outputs. After the bootstrap run, Terraform Cloud will print these outputs. You can then copy the generated backend snippet into a versioned backend.tf
file in your repo.
We use that configuration for our production backend.tf
file.
terraform init --reconfigure # in prod/ directory
# OR IF AND ONLY IF YOU RAN USING LOCAL BACKEND
terraform init --force-copy # Terraform detects that you want to move your state to the S3 backend, and it does so per -auto-approve.
Now the state is stored in the S3 bucket, and the DynamoDB table will be used to lock the state to prevent concurrent modification. This concludes the one-time preparation. Now we can extend and modify our Terraform configuration as usual.
Here’s a finalised version of your MkDocs note with tightened structure, consistent formatting, and no ambiguity:
Destroying bootstrap¶
Important
Please read this section in full before implementing.
In this setup:
- Bootstrap runs in Terraform Cloud (remote backend).
- Bootstrap provisions an S3 bucket (
devopssean-prod-terraform-state
) and a DynamoDB table (devopssean-prod-terraform-state-lock
). - Other environments (
prod
,staging
,dev
) are configured to use those backend resources.
If you want to destroy bootstrap and clean up, follow these steps.
-
Check dependencies
-
Confirm that no environments (
prod
,staging
,dev
) are still using the bootstrap S3 bucket and DynamoDB table. - If they are, migrate them to Terraform Cloud first. Otherwise they will break.
Migration procedure¶
Step 1 – Create Terraform Cloud workspaces
- In Terraform Cloud, create a separate workspace for each environment (
prod
,staging
,dev
). - Point each workspace to the corresponding environment folder in your repository.
- Add AWS credentials or an OIDC role in workspace variables so Terraform Cloud can deploy infrastructure.
Step 2 – Update backend configuration
Replace the S3 backend in each environment’s backend.tf
with a Terraform Cloud block:
terraform {
cloud {
organization = "devopssean"
workspaces {
name = "prod" # or staging/dev as appropriate
}
}
}
Remove the old backend "s3"
block entirely.
Step 3 – Re-initialise with migration
In each environment folder:
Terraform will:
- Detect the backend change (S3 → Terraform Cloud).
- Prompt to migrate state.
- Move
.tfstate
from S3 into Terraform Cloud.
For automation, use:
Step 4 – Validate
- In Terraform Cloud, check that each workspace now shows existing resources.
- Run
terraform plan
to confirm there is no drift.
Step 5 – Destroy bootstrap
Once all environments are migrated:
This deletes the S3 bucket, DynamoDB table, and any other bootstrap resources.
Risks if skipped
-
If you destroy bootstrap first, the S3 bucket and DynamoDB table are lost. Then
terraform init --migrate-state
cannot read the old state, and environments will behave as if they are fresh deployments (risk of duplicate resources). -
Go to bootstrap folder
- Destroy bootstrap resources
This removes the backend bucket, DynamoDB table, and all other bootstrap resources.
-
Verify state
-
Bootstrap workspace in Terraform Cloud should show
0
resources. - S3 should no longer have a bucket named
devopssean-prod-terraform-state
. - DynamoDB should no longer have a table named
devopssean-prod-terraform-state-lock
.
Important
After this, any prod
, staging
, or dev
environment still pointing to the old backend will fail. Always migrate them to Terraform Cloud before destroying bootstrap.