Skip to content

Conversation

halradaideh
Copy link

Problem
Organizations using GitHub Actions with self-hosted runners on GKE cannot attribute costs to specific repositories, workflows, or jobs because runner pods lack workflow metadata labels.

Solution
Automatically apply workflow metadata as pod labels when jobs start, enabling cost tracking through GKE's existing cost allocation infrastructure.

Reference: https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations

Changes

  1. Worker Implementation (cmd/ghalistener/worker/worker.go):

    • Added updatePodLabelsWithWorkflowMetadata() function
    • Added sanitizeLabelValue() for Kubernetes label validation
    • Modified HandleJobStarted() to apply labels when jobs start
    • Added WithClientset() option for testability
  2. RBAC Permissions (charts/gha-runner-scale-set/templates/manager_role.yaml):

    • Added update and patch verbs to pods rule

Labels Applied

  • github.com/repository: Repository name (sanitized)
  • github.com/workflow: Workflow reference (sanitized)
  • github.com/job: Job display name (sanitized)
  • github.com/job-id: Unique job ID
  • github.com/run-id: Workflow run ID

Example Output

labels:
  actions.github.com/organization: example-org
  actions.github.com/scale-set-name: runners-x
  actions.github.com/scale-set-namespace: arc-runners
  github.com/job: build
  github.com/job-id: 1234abcd-5678-90ef-1234-567890abcdef
  github.com/repository: example-repo
  github.com/run-id: "123456789"
  github.com/workflow: example-repo-.github-workflows-build.yaml-refs-heads-main

Key Features

  • Labels applied when jobs actually start (not when pods are created)
  • All values sanitized for Kubernetes compliance
  • Graceful error handling - failures don't affect job execution
  • Works with existing GKE cost allocation and BigQuery exports

Testing

  • ✅ Tested in production environment
  • ✅ Verified labels are correctly applied and sanitized
  • ✅ Confirmed RBAC permissions work correctly

Benefits

  • Real-time cost tracking by repository, workflow, and job
  • Leverages existing GKE cost allocation infrastructure
  • No performance impact (one-time label assignment)
  • Automatic integration with BigQuery cost queries
  • Backward compatible with no breaking changes

This enables organizations to track GitHub Actions costs at the repository, workflow, and job level using their existing GKE cost allocation setup.

disclaimer: this was developed isong Cursor AI, so it needs a proper review.

Add workflow metadata labels to runner pods for cost tracking
@Copilot Copilot AI review requested due to automatic review settings September 21, 2025 12:12
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds GitHub workflow metadata labels to runner pods for cost allocation and tracking in GKE environments. This enables organizations to attribute GitHub Actions costs to specific repositories, workflows, and jobs through Kubernetes labels.

  • Implements automatic labeling of runner pods with workflow metadata when jobs start
  • Adds RBAC permissions for pod updates in both manager and kube mode roles
  • Includes label value sanitization for Kubernetes compliance

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
cmd/ghalistener/worker/worker.go Core implementation with pod labeling logic, sanitization function, and job handling updates
charts/gha-runner-scale-set/templates/manager_role.yaml Adds update and patch verbs to pods rule for manager role
charts/gha-runner-scale-set/templates/kube_mode_role.yaml Adds update and patch verbs to pods rule for kube mode role

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +34 to +39
reg := regexp.MustCompile(`[^a-zA-Z0-9._-]`)
sanitized := reg.ReplaceAllString(value, "-")

// Remove consecutive hyphens
reg = regexp.MustCompile(`-+`)
sanitized = reg.ReplaceAllString(sanitized, "-")
Copy link

Copilot AI Sep 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider compiling the regular expressions once as package-level variables instead of recompiling them on every function call. This will improve performance when the function is called frequently.

Copilot uses AI. Check for mistakes.

sanitized = reg.ReplaceAllString(sanitized, "-")

// Ensure it starts and ends with alphanumeric character
sanitized = strings.Trim(sanitized, "-._")
Copy link

Copilot AI Sep 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After trimming characters, the result could be empty even if the original sanitized string wasn't empty, but the function only checks for empty strings before this step. This could result in multiple values being mapped to 'unknown' when they should have distinct sanitized values.

Copilot uses AI. Check for mistakes.

sanitized = sanitized[:63]
sanitized = strings.Trim(sanitized, "-._")
}

Copy link

Copilot AI Sep 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After truncating and trimming, the sanitized string could become empty, but there's no check to handle this case. This could result in empty label values which are invalid in Kubernetes.

Suggested change
// If empty after truncation and trimming, use a default value
if sanitized == "" {
sanitized = "unknown"
}

Copilot uses AI. Check for mistakes.

"pod", pod.Name,
"labels", workflowLabels)

_, err = w.clientset.CoreV1().Pods(w.config.EphemeralRunnerSetNamespace).Update(ctx, pod, metav1.UpdateOptions{})
Copy link

Copilot AI Sep 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Update() without checking resource version could lead to conflicts if the pod is modified by another process between Get() and Update(). Consider using Patch() instead or implementing retry logic with exponential backoff to handle update conflicts.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant