Skip to content

Commit 1262308

Browse files
authored
feat(terraform): support for remote terraform state, introduce managed redis and secrets manager (#274)
This pull request introduces several major improvements and new resources to the Terraform infrastructure codebase, focusing on enhanced state management, expanded cloud resource provisioning, and improved secrets handling. The changes add support for remote Terraform state via S3-compatible object storage, automate backend bootstrapping, and introduce managed Redis and Secrets Manager resources. Additional updates improve cluster configuration and documentation. **Terraform State Management & Automation:** - Added support for using an S3-compatible backend for Terraform state, including a new `tfstate` object storage bucket, output wiring, and documentation on state management. A helper script (`init-backend.sh`) was introduced to automate backend initialization and migration, generating a `.backend.hcl` file with the necessary credentials and configuration. (`[[1]](diffhunk://#diff-40e942f521b179f4b67af29e0186e895becd783b1d994f74afdaa204a4007eafR1-R16)`, `[[2]](diffhunk://#diff-40e942f521b179f4b67af29e0186e895becd783b1d994f74afdaa204a4007eafL33-R44)`, `[[3]](diffhunk://#diff-951d6ab4b0142466865bc9a073ac82641fd19b6fa1267f65e82d5b827922eaecR95-R157)`, `[[4]](diffhunk://#diff-be3ec119082ecec13a5ec2e74162fd5d059cb933742745167663003e8f5ccd55R1-R66)`, `[[5]](diffhunk://#diff-b56e9e8eb752928fb506809cc8881dfda6490b1ea830ac6cc0024e37f543c572R2)`, `[[6]](diffhunk://#diff-2cfe3e1ceb805f812736573a76b766c3cb8da0ea0ac4931d15bb75dc566a846aL4-R5)`) - Updated `.gitignore` to exclude backend configuration and kubeconfig files from version control. (`[infrastructure/.gitignoreL4-R5](diffhunk://#diff-2cfe3e1ceb805f812736573a76b766c3cb8da0ea0ac4931d15bb75dc566a846aL4-R5)`) **New Cloud Resources:** - Added managed Redis provisioning, including instance and credential resources, and corresponding input variables for version and plan selection. (`[[1]](diffhunk://#diff-f116a20752cd128cc4f5a85ea3b01e4acbf9fabbfdcdd04dfc9901e3def7b326R1-R18)`, `[[2]](diffhunk://#diff-9772d64123f334ac306e54c19018864cc1451e7e4fe5f14658783372750250f1L39-R63)`) - Added support for STACKIT Secrets Manager, provisioning an instance and user, with outputs for integration. (`[[1]](diffhunk://#diff-800eb980bf14a2c09d182f500ef8cb884eeb18ebe50eadb0b682463c61ba2f58R1-R24)`, `[[2]](diffhunk://#diff-9772d64123f334ac306e54c19018864cc1451e7e4fe5f14658783372750250f1L39-R63)`) - Added a model serving token resource and output for AI Model Serving API integration. (`[infrastructure/terraform/model_serving.tfR1-R12](diffhunk://#diff-12cf4786858eaf9635d3e45f439444fc5e956e3fb7407b09cf512cad83d2bda5R1-R12)`) **Secrets Management & Seeding:** - Introduced a new `seed-secrets` Terraform module with documentation, example variables, and configuration to seed the Secrets Manager with required secrets for External Secrets integration. (`[[1]](diffhunk://#diff-b0300cdd94aa57163b0041cb50ea4990acb4bb8a351079693c26ee64d61fcd72R1-R31)`, `[[2]](diffhunk://#diff-cb4b240b14f3d9aa644d4260872c9586f77d824aa8f36b910503942f028c1d88R1-R26)`, `[[3]](diffhunk://#diff-d201981cfa7cef09bee51e1359d030e8fa78f73164e37a306c378c9d6f2d3eb8R1-R29)`, `[[4]](diffhunk://#diff-2bf98bb86073642173af57afd63a434d44fa3ce87d6ef61916be35585a0ab94fR1-R39)`) **Cluster & Networking Enhancements:** - Upgraded the Kubernetes cluster minimum version and improved node pool specs (larger machine type and disk). Added automatic kubeconfig generation and output, including writing to `kubeconfig.yaml`. (`[[1]](diffhunk://#diff-60c4ff86f01efedc7e7e4e8c1cee2e772e458b6b71f9980b342196216bbc0a8dL4-R15)`, `[[2]](diffhunk://#diff-60c4ff86f01efedc7e7e4e8c1cee2e772e458b6b71f9980b342196216bbc0a8dR31-R56)`) - Improved DNS zone resource configuration with contact email and explicit type. (`[infrastructure/terraform/dns.tfR5-R6](diffhunk://#diff-1c935b36cdab82f9bdd925fecea18d7225ec865f99937585f4897155bd9935f9R5-R6)`) **Other Improvements:** - Updated variable descriptions for clarity and adjusted the default deployment timestamp for resource naming. (`[[1]](diffhunk://#diff-9772d64123f334ac306e54c19018864cc1451e7e4fe5f14658783372750250f1L7-R7)`, `[[2]](diffhunk://#diff-9772d64123f334ac306e54c19018864cc1451e7e4fe5f14658783372750250f1L39-R63)`) --- **Most important changes:** **Terraform State Management & Automation** - Added S3-compatible backend support for Terraform state, including a dedicated `tfstate` bucket, outputs, and documentation. Introduced the `init-backend.sh` script for automated backend setup and state migration, generating `.backend.hcl` for credentials/config. (`[[1]](diffhunk://#diff-40e942f521b179f4b67af29e0186e895becd783b1d994f74afdaa204a4007eafR1-R16)`, `[[2]](diffhunk://#diff-40e942f521b179f4b67af29e0186e895becd783b1d994f74afdaa204a4007eafL33-R44)`, `[[3]](diffhunk://#diff-951d6ab4b0142466865bc9a073ac82641fd19b6fa1267f65e82d5b827922eaecR95-R157)`, `[[4]](diffhunk://#diff-be3ec119082ecec13a5ec2e74162fd5d059cb933742745167663003e8f5ccd55R1-R66)`, `[[5]](diffhunk://#diff-b56e9e8eb752928fb506809cc8881dfda6490b1ea830ac6cc0024e37f543c572R2)`, `[[6]](diffhunk://#diff-2cfe3e1ceb805f812736573a76b766c3cb8da0ea0ac4931d15bb75dc566a846aL4-R5)`) **New Cloud Resources** - Added managed Redis instance and credential resources, with configurable version and plan variables. (`[[1]](diffhunk://#diff-f116a20752cd128cc4f5a85ea3b01e4acbf9fabbfdcdd04dfc9901e3def7b326R1-R18)`, `[[2]](diffhunk://#diff-9772d64123f334ac306e54c19018864cc1451e7e4fe5f14658783372750250f1L39-R63)`) - Added STACKIT Secrets Manager instance and user resources, with outputs for integration. (`[[1]](diffhunk://#diff-800eb980bf14a2c09d182f500ef8cb884eeb18ebe50eadb0b682463c61ba2f58R1-R24)`, `[[2]](diffhunk://#diff-9772d64123f334ac306e54c19018864cc1451e7e4fe5f14658783372750250f1L39-R63)`) - Added model serving token resource and output for AI Model Serving API. (`[infrastructure/terraform/model_serving.tfR1-R12](diffhunk://#diff-12cf4786858eaf9635d3e45f439444fc5e956e3fb7407b09cf512cad83d2bda5R1-R12)`) **Secrets Management & Seeding** - Introduced the `seed-secrets` module for seeding Secrets Manager with required secrets, including documentation, example variables, and configuration for External Secrets. (`[[1]](diffhunk://#diff-b0300cdd94aa57163b0041cb50ea4990acb4bb8a351079693c26ee64d61fcd72R1-R31)`, `[[2]](diffhunk://#diff-cb4b240b14f3d9aa644d4260872c9586f77d824aa8f36b910503942f028c1d88R1-R26)`, `[[3]](diffhunk://#diff-d201981cfa7cef09bee51e1359d030e8fa78f73164e37a306c378c9d6f2d3eb8R1-R29)`, `[[4]](diffhunk://#diff-2bf98bb86073642173af57afd63a434d44fa3ce87d6ef61916be35585a0ab94fR1-R39)`) **Cluster & Networking Enhancements** - Upgraded Kubernetes cluster version and node pool specs, and added automated kubeconfig generation/output to `kubeconfig.yaml`. (`[[1]](diffhunk://#diff-60c4ff86f01efedc7e7e4e8c1cee2e772e458b6b71f9980b342196216bbc0a8dL4-R15)`, `[[2]](diffhunk://#diff-60c4ff86f01efedc7e7e4e8c1cee2e772e458b6b71f9980b342196216bbc0a8dR31-R56)`) - Improved DNS zone resource with contact email and explicit type. (`[infrastructure/terraform/dns.tfR5-R6](diffhunk://#diff-1c935b36cdab82f9bdd925fecea18d7225ec865f99937585f4897155bd9935f9R5-R6)`)
1 parent 8bd165d commit 1262308

File tree

15 files changed

+385
-11
lines changed

15 files changed

+385
-11
lines changed

infrastructure/.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
**/cr-secret.yaml
22
**/customer-values.yaml
33
**/auth
4-
4+
**/.backend.hcl
5+
**/kubeconfig.yaml
56
**/*.lock.*
67

78
auth

infrastructure/terraform/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,69 @@ terraform output -json | jq -r .cluster_name.value
9292
- **Security**: The sa_key.json file contains sensitive credentials and should never be committed to version control
9393
- **State Management**: Consider using remote state storage for team environments
9494

95+
## Using the bucket as a Terraform S3 backend (optional)
96+
97+
If you want Terraform state to be stored in the object storage bucket, add a backend block to your root module.
98+
Note: backend blocks cannot reference resources, so you must hardcode or pass the values via variables/partials.
99+
100+
### Bootstrap script (recommended)
101+
102+
Note: `backend "s3" {}` is already defined in `main.tf`. The bootstrap step still works because it runs `terraform init -backend=false`, which ignores the backend block.
103+
104+
Use the helper script to bootstrap the backend in two phases:
105+
1) Run a local-only apply to create the bucket + credentials.
106+
2) Generate `.backend.hcl` and migrate state to S3.
107+
108+
```bash
109+
./scripts/init-backend.sh
110+
```
111+
112+
This writes `infrastructure/terraform/.backend.hcl` (contains credentials) and runs `terraform init -force-copy`.
113+
You can re-run the script at any time; it reuses the existing backend config if present.
114+
If you want remote state from the start, run this script before your first full `terraform apply`.
115+
116+
If you want non-interactive bootstrap:
117+
118+
```bash
119+
BOOTSTRAP_AUTO_APPROVE=1 ./scripts/init-backend.sh
120+
```
121+
122+
Manual phase 1 (if you want to see the exact commands the script runs):
123+
124+
```bash
125+
terraform init -backend=false
126+
terraform apply \
127+
-target=stackit_objectstorage_bucket.tfstate \
128+
-target=stackit_objectstorage_credentials_group.rag_creds_group \
129+
-target=stackit_objectstorage_credential.rag_creds
130+
```
131+
132+
### Manual backend block
133+
134+
```hcl
135+
terraform {
136+
backend "s3" {
137+
bucket = "<BUCKET_NAME>"
138+
key = "terraform.tfstate"
139+
region = "eu01"
140+
141+
# Use the same credentials as above
142+
access_key = "<ACCESS_KEY>"
143+
secret_key = "<SECRET_KEY>"
144+
145+
endpoints = {
146+
s3 = "https://object.storage.eu01.onstackit.cloud"
147+
}
148+
149+
# AWS-specific checks must be disabled for STACKIT
150+
skip_credentials_validation = true
151+
skip_region_validation = true
152+
skip_s3_checksum = true
153+
skip_requesting_account_id = true
154+
}
155+
}
156+
```
157+
95158
## Cleanup
96159

97160
To destroy all resources:

infrastructure/terraform/dns.tf

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
resource "stackit_dns_zone" "rag_zone" {
2-
project_id = var.project_id
3-
name = "${var.name_prefix}-zone"
4-
dns_name = var.dns_name
2+
project_id = var.project_id
3+
name = "${var.name_prefix}-zone"
4+
dns_name = var.dns_name
5+
contact_email = "data-ai@stackit.cloud"
6+
type = "primary"
57
}
68

79
output "dns_nameservers" {

infrastructure/terraform/main.tf

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
terraform {
2+
backend "s3" {}
23
required_providers {
34
stackit = {
45
source = "stackitcloud/stackit"
@@ -9,4 +10,5 @@ terraform {
910

1011
provider "stackit" {
1112
service_account_key_path = "sa_key.json"
13+
default_region = "eu01"
1214
}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
resource "stackit_modelserving_token" "rag_modelserving" {
2+
project_id = var.project_id
3+
name = "${var.name_prefix}-modelserving-token"
4+
5+
# No ttl_duration set -> token does not expire.
6+
}
7+
8+
output "model_serving_bearer_token" {
9+
description = "Bearer token for AI Model Serving API"
10+
value = stackit_modelserving_token.rag_modelserving.token
11+
sensitive = true
12+
}

infrastructure/terraform/object_storage.tf

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,19 @@
1+
# This resource stays stable for 365 days, then changes
2+
resource "time_rotating" "key_rotation" {
3+
rotation_days = 365
4+
}
5+
16
resource "stackit_objectstorage_bucket" "documents" {
27
name = "${var.name_prefix}-documents-${var.deployment_timestamp}"
38
project_id = var.project_id
49
}
510

11+
resource "stackit_objectstorage_bucket" "tfstate" {
12+
name = "${var.name_prefix}-tfstate-${var.deployment_timestamp}"
13+
project_id = var.project_id
14+
depends_on = [stackit_objectstorage_credentials_group.rag_creds_group]
15+
}
16+
617
resource "stackit_objectstorage_bucket" "langfuse" {
718
name = "${var.name_prefix}-langfuse-${var.deployment_timestamp}"
819
project_id = var.project_id
@@ -16,7 +27,7 @@ resource "stackit_objectstorage_credentials_group" "rag_creds_group" {
1627
resource "stackit_objectstorage_credential" "rag_creds" {
1728
project_id = var.project_id
1829
credentials_group_id = stackit_objectstorage_credentials_group.rag_creds_group.credentials_group_id
19-
expiration_timestamp = timeadd(timestamp(), "8760h") # Expires after 1 year
30+
expiration_timestamp = timeadd(time_rotating.key_rotation.rfc3339, "8760h")
2031
}
2132

2233
output "object_storage_access_key" {
@@ -30,5 +41,5 @@ output "object_storage_secret_key" {
3041
}
3142

3243
output "object_storage_bucket" {
33-
value = stackit_objectstorage_bucket.documents.name
44+
value = stackit_objectstorage_bucket.tfstate.name
3445
}

infrastructure/terraform/redis.tf

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
resource "stackit_redis_instance" "rag_redis" {
2+
project_id = var.project_id
3+
name = "${var.name_prefix}-redis"
4+
version = var.redis_version
5+
plan_name = var.redis_plan_name
6+
7+
parameters = {
8+
sgw_acl = join(",", stackit_ske_cluster.rag_cluster.egress_address_ranges)
9+
enable_monitoring = false
10+
down_after_milliseconds = 30000
11+
}
12+
}
13+
14+
15+
resource "stackit_redis_credential" "rag_redis_cred" {
16+
project_id = var.project_id
17+
instance_id = stackit_redis_instance.rag_redis.instance_id
18+
}
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
#!/usr/bin/env bash
2+
set -euo pipefail
3+
4+
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
5+
root_dir="$(cd "${script_dir}/.." && pwd)"
6+
7+
backend_config_file="${BACKEND_CONFIG_FILE:-${root_dir}/.backend.hcl}"
8+
auto_approve="${BOOTSTRAP_AUTO_APPROVE:-0}"
9+
10+
cd "${root_dir}"
11+
12+
if ! command -v terraform >/dev/null 2>&1; then
13+
echo "terraform is not installed or not in PATH." >&2
14+
exit 1
15+
fi
16+
17+
if [ -f "${backend_config_file}" ]; then
18+
terraform init -backend-config="${backend_config_file}"
19+
exit 0
20+
fi
21+
22+
echo "Bootstrapping object storage for Terraform state (local backend)."
23+
terraform init -backend=false
24+
25+
if ! bucket="$(terraform output -raw object_storage_bucket 2>/dev/null)"; then
26+
apply_args=(
27+
"-target=stackit_objectstorage_bucket.tfstate"
28+
"-target=stackit_objectstorage_credentials_group.rag_creds_group"
29+
"-target=stackit_objectstorage_credential.rag_creds"
30+
"-target=time_rotating.key_rotation" # <--- Add this (needed for creds)
31+
"-target=output.object_storage_bucket" # <--- Add this
32+
"-target=output.object_storage_access_key" # <--- Add this
33+
"-target=output.object_storage_secret_key" # <--- Add this
34+
)
35+
if [ "${auto_approve}" = "1" ]; then
36+
terraform apply -auto-approve "${apply_args[@]}"
37+
else
38+
terraform apply "${apply_args[@]}"
39+
fi
40+
bucket="$(terraform output -raw object_storage_bucket)"
41+
fi
42+
43+
access_key="$(terraform output -raw object_storage_access_key)"
44+
secret_key="$(terraform output -raw object_storage_secret_key)"
45+
46+
cat > "${backend_config_file}" <<EOF
47+
bucket = "${bucket}"
48+
key = "terraform.tfstate"
49+
region = "eu01"
50+
51+
access_key = "${access_key}"
52+
secret_key = "${secret_key}"
53+
54+
endpoints = {
55+
s3 = "https://object.storage.eu01.onstackit.cloud"
56+
}
57+
58+
skip_credentials_validation = true
59+
skip_region_validation = true
60+
skip_s3_checksum = true
61+
skip_requesting_account_id = true
62+
EOF
63+
64+
chmod 600 "${backend_config_file}"
65+
66+
terraform init -backend-config="${backend_config_file}" -force-copy
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
resource "stackit_secretsmanager_instance" "rag_secrets" {
2+
project_id = var.project_id
3+
name = "${var.name_prefix}-secrets"
4+
}
5+
6+
resource "stackit_secretsmanager_user" "rag_secrets_user" {
7+
project_id = var.project_id
8+
instance_id = stackit_secretsmanager_instance.rag_secrets.instance_id
9+
description = var.secretsmanager_user_description
10+
write_enabled = var.secretsmanager_user_write_enabled
11+
}
12+
13+
output "secretsmanager_instance_id" {
14+
value = stackit_secretsmanager_instance.rag_secrets.instance_id
15+
}
16+
17+
output "secretsmanager_username" {
18+
value = stackit_secretsmanager_user.rag_secrets_user.username
19+
}
20+
21+
output "secretsmanager_password" {
22+
value = stackit_secretsmanager_user.rag_secrets_user.password
23+
sensitive = true
24+
}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Seed Secrets Manager data with Terraform
2+
3+
This folder writes the `rag-secrets` KV secret used by External Secrets.
4+
5+
Why this is a separate step:
6+
- The Secrets Manager instance ID and user credentials are created by the main Terraform stack.
7+
- Provider blocks cannot depend on resources, so we pass them in via variables and run a second apply.
8+
9+
## Steps
10+
11+
1. Apply the main stack to create the Secrets Manager instance and user.
12+
2. Copy `terraform.tfvars.example` to `terraform.tfvars` and fill in the values:
13+
- `vault_mount_path` is the Secrets Manager instance ID.
14+
- `vault_username`/`vault_password` are from the `secretsmanager_*` outputs.
15+
- `rag_secrets` should include all keys referenced by your ExternalSecret resources.
16+
For the cert-manager webhook, store the service account key JSON under `STACKIT_CERT_MANAGER_SA_JSON` (use a heredoc to avoid escaping).
17+
3. Run:
18+
```bash
19+
terraform init
20+
terraform plan
21+
terraform apply
22+
```
23+
24+
## Security note
25+
26+
All values written by `vault_kv_secret_v2` are stored in Terraform state. Use a secure backend and restrict access.
27+
28+
## Troubleshooting
29+
30+
If you see a 404 from `auth/token/create`, the backend does not allow child token creation.
31+
This module sets `skip_child_token = true` so Vault uses the login token directly.

0 commit comments

Comments
 (0)