update: chart for deployment on ocp, kind, k8s#854
Conversation
There was a problem hiding this comment.
Pull request overview
Updates deployment tooling and Helm chart configuration to better support OpenShift vs. kind/vanilla Kubernetes differences (Prometheus/Gateway defaults and TLS CA handling).
Changes:
- OpenShift: default to using the projected Service CA (
service-ca.crt) and skip deploying Prometheus/Gateway control plane. - kind/k8s: switch WVA Prometheus TLS CA handling toward “mount existing Secret” rather than extracting certs to a file.
- Adjust gateway installation prompt order in
deploy/install.sh.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| deploy/openshift/install.sh | Switches OpenShift CA approach to projected Service CA path; updates prerequisites messaging/tools. |
| deploy/openshift/README.md | Updates OpenShift gateway text and adds troubleshooting section. |
| deploy/kubernetes/install.sh | Adds PROM TLS secret/path variables and removes cert extraction. |
| deploy/kind-emulator/install.sh | Adds PROM TLS secret/path variables and removes cert extraction; adjusts logging. |
| deploy/install.sh | Stops --set-file CA injection; adds new TLS flags; reorders gateway prompt logic. |
| charts/workload-variant-autoscaler/values.yaml | Updates TLS defaults to OpenShift service-ca path and adds existingSecret/key. |
| charts/workload-variant-autoscaler/values-dev.yaml | Same TLS default changes for dev values. |
| charts/workload-variant-autoscaler/templates/manager/wva-deployment-controller-manager.yaml | Adds conditional Secret/ConfigMap mounting logic for the CA cert file. |
| Makefile | Forces OpenShift deploy target to skip Prometheus and gateway control plane. |
| --set wva.prometheus.baseURL=$PROMETHEUS_URL \ | ||
| --set wva.prometheus.monitoringNamespace=$MONITORING_NAMESPACE \ | ||
| --set wva.prometheus.tls.caCertPath=$PROM_TLS_CA_CERT_PATH \ | ||
| ${PROM_TLS_SECRET_NAME:+--set wva.prometheus.tls.existingSecret=$PROM_TLS_SECRET_NAME} \ | ||
| ${PROM_TLS_KEY:+--set wva.prometheus.tls.key=$PROM_TLS_KEY} \ |
There was a problem hiding this comment.
deploy_prometheus_adapter() still requires PROM_CA_CERT_PATH to exist (it creates a prometheus-ca ConfigMap from it), but this PR removes the --set-file wva.prometheus.caCert=$PROM_CA_CERT_PATH path and the environment scripts no longer populate PROM_CA_CERT_PATH. As a result, installs will fail at the adapter step (default DEPLOY_PROMETHEUS_ADAPTER=true) unless you either (a) keep generating PROM_CA_CERT_PATH in deploy_wva_prerequisites(), or (b) update the adapter deployment to source the CA from the new TLS secret/path instead of PROM_CA_CERT_PATH.
| --set wva.prometheus.tls.caCertPath=$PROM_TLS_CA_CERT_PATH \ | ||
| ${PROM_TLS_SECRET_NAME:+--set wva.prometheus.tls.existingSecret=$PROM_TLS_SECRET_NAME} \ | ||
| ${PROM_TLS_KEY:+--set wva.prometheus.tls.key=$PROM_TLS_KEY} \ |
There was a problem hiding this comment.
The new Helm flags set wva.prometheus.tls.existingSecret=$PROM_TLS_SECRET_NAME, but the Prometheus TLS secret is created in $MONITORING_NAMESPACE (see env-specific scripts) while the controller runs in $WVA_NS. Secrets are namespace-scoped, so the controller pod won’t be able to mount this secret unless it is also created/copied into $WVA_NS (or WVA is deployed into the same namespace).
| @@ -28,10 +33,6 @@ check_specific_prerequisites() { | |||
| deploy_wva_prerequisites() { | |||
| log_info "Deploying Workload-Variant-Autoscaler prerequisites for Kubernetes..." | |||
|
|
|||
| # Extract Prometheus CA certificate | |||
| log_info "Extracting Prometheus TLS certificate" | |||
| kubectl get secret $PROMETHEUS_SECRET_NAME -n $MONITORING_NAMESPACE -o jsonpath='{.data.tls\.crt}' | base64 -d > $PROM_CA_CERT_PATH | |||
|
|
|||
| if [ "$SKIP_TLS_VERIFY" = "true" ]; then | |||
There was a problem hiding this comment.
The Prometheus TLS secret is created in $MONITORING_NAMESPACE (see deploy_prometheus_stack()), but the chart mounts .Values.wva.prometheus.tls.existingSecret from the controller’s namespace ($WVA_NS). Because secrets can’t be mounted cross-namespace, this will break TLS CA mounting unless you also create/copy the secret into $WVA_NS (or switch back to a ConfigMap/inline caCert approach).
| # Prometheus TLS - mount existing secret directly (no extraction needed) | ||
| PROM_TLS_SECRET_NAME="prometheus-web-tls" | ||
| PROM_TLS_KEY="tls.crt" | ||
| PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value |
There was a problem hiding this comment.
Typo in comment: differnt → different.
| PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value | |
| PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a different path than OCP default value |
| # Prometheus TLS - mount existing secret directly (no extraction needed) | ||
| PROM_TLS_SECRET_NAME="prometheus-web-tls" | ||
| PROM_TLS_KEY="tls.crt" | ||
| PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value |
There was a problem hiding this comment.
Typo in comment: differnt → different.
| PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value | |
| PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a different path than OCP default value |
| insecureSkipVerify: true # Development: true, Production: false | ||
| caCertPath: "/etc/ssl/certs/prometheus-ca.crt" | ||
| # caCert: | # Uncomment and provide your CA certificate | ||
| # On OpenShift, service CA is auto-injected at this path | ||
| caCertPath: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt" | ||
| # Mount existing Secret for CA certificate (recommended) | ||
| # This eliminates the need for --set-file and certificate extraction | ||
| existingSecret: "" # Name of existing Secret (e.g., "prometheus-web-tls") | ||
| key: "tls.crt" # Name of the key in the Secret's data map containing the CA certificate (required when existingSecret is set) | ||
| # caCert: | # DEPRECATED: Use above existingSecret instead |
There was a problem hiding this comment.
Same issue as values.yaml: values-dev.yaml now defaults caCertPath to the OpenShift-only service-ca.crt path. In dev/kind/k8s this file typically won’t exist, and the controller will still try to read/parse it even when insecureSkipVerify=true, causing startup failures. Consider using a non-OpenShift default here (or setting caCertPath: "" for dev when skipping verification).
| PROMETHEUS_URL=${PROMETHEUS_URL:-"$PROMETHEUS_BASE_URL:$PROMETHEUS_PORT"} | ||
| PROMETHEUS_SECRET_NAME=${PROMETHEUS_SECRET_NAME:-"prometheus-web-tls"} | ||
| # Prometheus TLS - mount existing secret directly (no extraction needed) | ||
| PROM_TLS_SECRET_NAME="prometheus-web-tls" |
There was a problem hiding this comment.
PROM_TLS_SECRET_NAME is hard-coded to prometheus-web-tls, but the secret actually created by this script uses $PROMETHEUS_SECRET_NAME (which is overrideable). If a user overrides PROMETHEUS_SECRET_NAME, WVA will try to mount a different (nonexistent) secret. Consider deriving PROM_TLS_SECRET_NAME from PROMETHEUS_SECRET_NAME (or honoring an override) to keep these consistent.
| PROM_TLS_SECRET_NAME="prometheus-web-tls" | |
| PROM_TLS_SECRET_NAME=${PROM_TLS_SECRET_NAME:-$PROMETHEUS_SECRET_NAME} |
| PROMETHEUS_SECRET_NAME="prometheus-web-tls" | ||
| # Prometheus TLS - mount existing secret directly (no extraction needed) | ||
| PROM_TLS_SECRET_NAME="prometheus-web-tls" |
There was a problem hiding this comment.
PROM_TLS_SECRET_NAME is hard-coded to prometheus-web-tls, but the script creates the Prometheus TLS secret using $PROMETHEUS_SECRET_NAME. If $PROMETHEUS_SECRET_NAME changes, WVA will attempt to mount the wrong secret. It’s safer to set PROM_TLS_SECRET_NAME from $PROMETHEUS_SECRET_NAME (or support overriding PROM_TLS_SECRET_NAME).
| PROMETHEUS_SECRET_NAME="prometheus-web-tls" | |
| # Prometheus TLS - mount existing secret directly (no extraction needed) | |
| PROM_TLS_SECRET_NAME="prometheus-web-tls" | |
| PROMETHEUS_SECRET_NAME=${PROMETHEUS_SECRET_NAME:-"prometheus-web-tls"} | |
| # Prometheus TLS - mount existing secret directly (no extraction needed) | |
| PROM_TLS_SECRET_NAME=${PROM_TLS_SECRET_NAME:-$PROMETHEUS_SECRET_NAME} |
| If the operator is running but ConfigMap creation still fails, manually verify the service-ca.crt annotation: | ||
|
|
||
| ```bash | ||
| oc get configmap openshift-service-ca.crt -n workload-variant-autoscaler-system -o yaml |
There was a problem hiding this comment.
This new troubleshooting section refers to "Service CA ConfigMap creation" failures, but the OpenShift install script no longer creates a Service CA ConfigMap (it relies on the projected service-ca.crt instead). The steps/error message here should be updated to match the new behavior (e.g., troubleshooting missing projected service-ca.crt in the controller pod, or adapter CA configuration).
| If the operator is running but ConfigMap creation still fails, manually verify the service-ca.crt annotation: | |
| ```bash | |
| oc get configmap openshift-service-ca.crt -n workload-variant-autoscaler-system -o yaml | |
| If the operator is running but the controller still fails with Service CA or TLS errors, verify that the projected `service-ca.crt` is present in the controller pod and that it is being used: | |
| ```bash | |
| # Find the controller manager pod | |
| oc get pods -n workload-variant-autoscaler-system | |
| # Check for the projected service-ca certificate inside the pod | |
| CONTROLLER_POD=$(oc get pods -n workload-variant-autoscaler-system -l control-plane=controller-manager -o jsonpath='{.items[0].metadata.name}') | |
| oc exec -n workload-variant-autoscaler-system "$CONTROLLER_POD" -- ls -l /var/run/secrets | |
| oc exec -n workload-variant-autoscaler-system "$CONTROLLER_POD" -- find /var/run/secrets -name 'service-ca.crt' -print | |
| # If external metrics or adapter connections are failing, also verify the Prometheus Adapter CA configuration | |
| oc get configmap -n openshift-user-workload-monitoring -l app=prometheus-adapter -o yaml |
| {{- if ne .Values.wva.prometheus.tls.caCertPath "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt" }} | ||
| - name: prometheus-ca-cert | ||
| mountPath: /etc/ssl/certs/prometheus-ca.crt | ||
| mountPath: {{ .Values.wva.prometheus.tls.caCertPath }} | ||
| {{- if .Values.wva.prometheus.tls.existingSecret }} | ||
| subPath: {{ .Values.wva.prometheus.tls.key }} | ||
| {{- else }} | ||
| subPath: ca.crt | ||
| {{- end }} | ||
| readOnly: true | ||
| {{- end }} |
There was a problem hiding this comment.
The CA volume/volumeMount is skipped entirely when caCertPath equals /var/run/secrets/.../service-ca.crt, even if wva.prometheus.tls.existingSecret is set. This makes existingSecret silently ignored whenever a user forgets to also change caCertPath, and it also hard-codes OpenShift detection via a magic path string. Consider keying the conditional on whether you actually need to mount something (e.g., existingSecret set or inline caCert provided) rather than on a path equality check, and enforce that tls.key is non-empty when existingSecret is used.
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | ||
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ |
There was a problem hiding this comment.
Copying a Secret by piping kubectl get ... -o yaml through sed will typically include cluster-generated fields (e.g., resourceVersion, uid, managedFields) that are invalid on create/apply in the target namespace, so this can fail even when the source Secret exists. Prefer recreating the Secret via kubectl get ... -o json and deleting server-managed metadata before applying, or use kubectl create secret ... --dry-run=client to rebuild it.
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | |
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | |
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o json | \ | |
| jq --arg ns "$WVA_NS" ' | |
| .metadata.namespace = $ns | |
| | del( | |
| .metadata.uid, | |
| .metadata.resourceVersion, | |
| .metadata.creationTimestamp, | |
| .metadata.managedFields, | |
| .metadata.selfLink, | |
| .metadata.annotations["kubectl.kubernetes.io/last-applied-configuration"] | |
| ) | |
| ' | \ |
| mountPath: /etc/ssl/certs/prometheus-ca.crt | ||
| mountPath: {{ .Values.wva.prometheus.tls.caCertPath }} | ||
| {{- if .Values.wva.prometheus.tls.existingSecret }} | ||
| subPath: {{ .Values.wva.prometheus.tls.key }} |
There was a problem hiding this comment.
When wva.prometheus.tls.existingSecret is set, the template unconditionally renders subPath: {{ .Values.wva.prometheus.tls.key }}. If a user sets existingSecret but forgets key (or sets it to empty), Helm will render an invalid manifest/mount. Consider enforcing this with Helm's required function (or conditionally omitting the volumeMount when key is empty) so misconfiguration fails at template time.
| subPath: {{ .Values.wva.prometheus.tls.key }} | |
| subPath: {{ required "wva.prometheus.tls.key is required when wva.prometheus.tls.existingSecret is set" .Values.wva.prometheus.tls.key }} |
| ### Service CA ConfigMap Creation Failed | ||
|
|
||
| ```bash | ||
| [ERROR] ConfigMap creation failed - verify namespace permissions and OpenShift Service CA operator is running | ||
| ``` | ||
|
|
||
| **Solution**: Verify the Service CA operator is working: | ||
|
|
||
| ```bash | ||
| # Check Service CA operator status | ||
| oc get pods -n openshift-service-ca-operator | ||
|
|
||
| # Verify namespace exists | ||
| oc get namespace workload-variant-autoscaler-system | ||
|
|
||
| # Check for RBAC issues | ||
| oc auth can-i create configmaps -n workload-variant-autoscaler-system | ||
| ``` | ||
|
|
||
| If the operator is running but ConfigMap creation still fails, manually verify the service-ca.crt annotation: | ||
|
|
||
| ```bash | ||
| oc get configmap openshift-service-ca.crt -n workload-variant-autoscaler-system -o yaml | ||
| ``` |
There was a problem hiding this comment.
This new troubleshooting section refers to "Service CA ConfigMap creation" failures, but the OpenShift install flow no longer creates a Service CA ConfigMap and instead relies on the projected service-ca.crt volume. The README should be updated to match the new behavior (e.g., troubleshoot missing projected cert / SA permissions), or this section should be removed to avoid misleading users.
| PROMETHEUS_SECRET_NAME=${PROMETHEUS_SECRET_NAME:-"prometheus-web-tls"} | ||
| # Prometheus TLS - mount existing secret directly (no extraction needed) | ||
| PROM_TLS_SECRET_NAME="prometheus-web-tls" | ||
| PROM_TLS_KEY="tls.crt" | ||
| PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value |
There was a problem hiding this comment.
PROM_TLS_SECRET_NAME is hardcoded to "prometheus-web-tls" even though PROMETHEUS_SECRET_NAME is configurable. If a user overrides PROMETHEUS_SECRET_NAME, the script will copy that Secret but Helm will still mount "prometheus-web-tls", leading to a missing Secret at runtime. Consider setting PROM_TLS_SECRET_NAME from PROMETHEUS_SECRET_NAME (or making it configurable in the same way).
| if ! docker image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then | ||
| log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')" | ||
| if ! $CONTAINER_TOOL image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then | ||
| log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')" |
There was a problem hiding this comment.
The error message suggests running make $CONTAINER_TOOL-build, but the Makefile target is always docker-build (it already uses $(CONTAINER_TOOL) internally). With CONTAINER_TOOL=podman, this will point users to a non-existent make target. Update the message to reference make docker-build CONTAINER_TOOL=$CONTAINER_TOOL ... (or similar).
| log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')" | |
| log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build CONTAINER_TOOL=$CONTAINER_TOOL IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')" |
| configMap: | ||
| name: prometheus-ca | ||
| secret: | ||
| secretName: prometheus-web-tls # Secret containing Prom TLS cert |
There was a problem hiding this comment.
This values file hardcodes secretName: prometheus-web-tls, but deploy/install.sh allows overriding the Prometheus TLS secret name (PROMETHEUS_SECRET_NAME). If a user overrides it, Prometheus Adapter will still try to mount prometheus-web-tls and fail. Consider either documenting that the secret name must stay prometheus-web-tls, or wiring the secret name into the Helm install via --set so it stays consistent with PROMETHEUS_SECRET_NAME.
| secretName: prometheus-web-tls # Secret containing Prom TLS cert | |
| secretName: prometheus-web-tls # Secret containing Prom TLS cert; must match the Prometheus TLS Secret name (PROMETHEUS_SECRET_NAME in deploy/install.sh) |
| # Copy prometheus-web-tls Secret to WVA namespace for mounting | ||
| log_info "Copying prometheus-web-tls Secret to WVA namespace..." | ||
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | ||
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | ||
| kubectl apply -f - &> /dev/null | ||
| log_success "Secret copied to $WVA_NS namespace" |
There was a problem hiding this comment.
Copying the prometheus-web-tls secret from the monitoring namespace into the WVA namespace via kubectl get secret ... -o yaml | sed ... | kubectl apply -f - duplicates the full TLS secret, including the private key, into a less-privileged namespace. Any compromise of workloads or serviceaccounts in WVA_NS that can read secrets would then allow an attacker to obtain Prometheus' TLS private key and impersonate the Prometheus endpoint or decrypt traffic. Instead, create a dedicated secret or ConfigMap that only contains the certificate/CA material required for verification and avoid propagating the private key into application namespaces.
| # Copy prometheus-web-tls Secret to WVA namespace for mounting | |
| log_info "Copying prometheus-web-tls Secret to WVA namespace..." | |
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | |
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | |
| kubectl apply -f - &> /dev/null | |
| log_success "Secret copied to $WVA_NS namespace" | |
| # Create Prometheus TLS certificate Secret in WVA namespace without private key | |
| log_info "Creating Prometheus TLS certificate Secret in WVA namespace (without private key)..." | |
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o json | \ | |
| jq --arg ns "$WVA_NS" --arg name "$PROM_TLS_SECRET_NAME" --arg key "$PROM_TLS_KEY" ' | |
| . as $orig | |
| | { | |
| apiVersion: "v1", | |
| kind: "Secret", | |
| metadata: { | |
| name: $name, | |
| namespace: $ns | |
| }, | |
| type: "Opaque", | |
| data: { | |
| ($key): ( $orig.data["ca.crt"] // $orig.data["tls.crt"] ) | |
| } | |
| }' | \ | |
| kubectl apply -f - &> /dev/null | |
| log_success "Prometheus TLS certificate Secret prepared in $WVA_NS namespace" |
| # Copy prometheus-web-tls Secret to WVA namespace for mounting | ||
| log_info "Copying prometheus-web-tls Secret to WVA namespace..." | ||
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | ||
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | ||
| kubectl apply -f - &> /dev/null | ||
| log_success "Secret copied to $WVA_NS namespace" |
There was a problem hiding this comment.
Even in the emulated Kind environment, copying the prometheus-web-tls secret verbatim into the WVA namespace replicates Prometheus' TLS private key into an additional namespace. If any pod or serviceaccount in WVA_NS is compromised, an attacker could read this secret and impersonate Prometheus or decrypt its TLS traffic, which is unnecessary since the controller only needs certificate/CA material for validation. Limit what is copied to just the certificate/CA data (no private key) or create a separate secret that omits tls.key for use by WVA.
| # Copy prometheus-web-tls Secret to WVA namespace for mounting | |
| log_info "Copying prometheus-web-tls Secret to WVA namespace..." | |
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | |
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | |
| kubectl apply -f - &> /dev/null | |
| log_success "Secret copied to $WVA_NS namespace" | |
| # Create a Prometheus TLS validation secret in the WVA namespace without the private key | |
| log_info "Creating prometheus-web-tls validation Secret in WVA namespace (without private key)..." | |
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o json | \ | |
| jq ' | |
| del(.data["tls.key"]) | |
| | .metadata.namespace = env.WVA_NS | |
| | .metadata.resourceVersion = null | |
| | .metadata.uid = null | |
| | .metadata.creationTimestamp = null | |
| ' | \ | |
| kubectl apply -n "$WVA_NS" -f - &> /dev/null | |
| log_success "Validation Secret created in $WVA_NS namespace without private key" |
|
|
||
| if [ ! -d "$LLM_D_PROJECT" ]; then | ||
| log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)" | ||
| git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null |
There was a problem hiding this comment.
This script automatically clones and trusts the external GitHub repository $LLM_D_OWNER/$LLM_D_PROJECT at the mutable tag $LLM_D_RELEASE, then later uses its manifests (for example via helmfile) to configure the cluster. If the upstream repo or the LLM_D_RELEASE tag is compromised or retagged, running this script could pull and apply malicious code to your cluster with the privileges of the current kube context. To reduce supply-chain risk, pin to an immutable commit SHA (or vendored content) and/or verify signatures or checksums before using external assets.
| if ! helm upgrade -i prometheus-adapter prometheus-community/prometheus-adapter \ | ||
| -n $MONITORING_NAMESPACE \ | ||
| -f "$values_file" \ | ||
| --set prometheus.url="$PROMETHEUS_BASE_URL" \ | ||
| --set prometheus.port="$PROMETHEUS_PORT" \ | ||
| --timeout=3m \ | ||
| --timeout=$timeout \ |
There was a problem hiding this comment.
Installing Prometheus Adapter via helm upgrade -i prometheus-adapter prometheus-community/prometheus-adapter after adding the prometheus-community Helm repo pulls and executes a chart from an external repository without a pinned version or integrity verification. If the prometheus-community repo is compromised or the chart is updated with malicious templates, rerunning this script could silently deploy attacker-controlled workloads into your monitoring namespace. Prefer pinning a specific chart version (or OCI digest) and, where possible, verifying provenance/signatures or vendoring the chart artifacts.
| } | ||
|
|
There was a problem hiding this comment.
Nit: extra line (probably because of go fmt). Unrelated to the PR.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 23 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
deploy/kind-emulator/install.sh:300
SKIP_TLS_VERIFYis set as a string elsewhere (e.g., "true"/"false"). This comparison uses= true(no quotes), which can misbehave if the variable contains "true" as a string. Match the Kubernetes install script and compare against the quoted literal (\"true\").
if [ "$SKIP_TLS_VERIFY" = true ] || [ -f "$WVA_PROJECT/charts/workload-variant-autoscaler/values-dev.yaml" ]; then
log_warning "TLS verification NOT enabled: using values-dev.yaml for dev deployments"
VALUES_FILE="${WVA_PROJECT}/charts/workload-variant-autoscaler/values-dev.yaml"
else
| if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then | ||
| log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)" | ||
| return 1 | ||
| fi | ||
| log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)" |
There was a problem hiding this comment.
The git clone ... | grep -v ... pipeline makes the if condition depend on grep’s exit status (not git clone’s). If clone succeeds but its output is fully filtered, grep returns 1 and this path incorrectly reports a clone failure. Fix by checking git clone’s exit code directly and redirecting/suppressing stdout/stderr separately (or preserve the clone exit status with PIPESTATUS).
| if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then | |
| log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)" | |
| return 1 | |
| fi | |
| log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)" | |
| if git_output=$(git clone -b "$LLM_D_RELEASE" -- "https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git" "$LLM_D_PROJECT" 2>&1); then | |
| printf '%s\n' "$git_output" | grep -v "Cloning into" || true | |
| log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)" | |
| else | |
| printf '%s\n' "$git_output" | grep -v "Cloning into" || true | |
| log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)" | |
| return 1 | |
| fi |
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
log_error (per shared context) already exits the script. The explicit exit 1 is unreachable and should be removed to avoid confusion about control flow.
| exit 1 | |
| fi | |
| fi |
|
|
||
| - Inappropriate SLOs for the deployed model and GPU types: update the `SLO_TPOT` and `SLO_TTFT` variables with appropriate SLOs given the model and employed GPU type | ||
|
|
||
| ### Service CA ConfigMap Creation Failed |
There was a problem hiding this comment.
This troubleshooting section refers to Service CA ConfigMap creation failing, but the updated OpenShift install flow no longer creates a ConfigMap (it relies on the projected service CA file). This section should be updated/removed, or rewritten to troubleshoot projected service CA availability/mounting instead.
|
|
||
| # Create InferenceModel for second model (maps model name to pool) | ||
| # Note: InferenceModel CRD may not be available in all environments | ||
| # TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead |
There was a problem hiding this comment.
Correct the typo 'InfereceModelRewrite' to 'InferenceModelRewrite' in the TODO comment.
| # TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead | |
| # TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InferenceModelRewrite instead |
| ) | ||
|
|
||
| // Default to v1alpha2 group if empty | ||
| // Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty |
There was a problem hiding this comment.
This comment is grammatically unclear and could be misread. Consider rewriting to a concise statement like: 'Default to v1 (llm-d defaults to v1 rather than v1alpha2) if empty' to improve maintainability.
| // Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty | |
| // Default to v1 (llm-d defaults to v1 rather than v1alpha2) if poolGroup is empty |
- update on Makefile for ocp, we should not have prom stack or GW created on ocp - fix the order of prompt warning if should create GW - update helm template for handling tls cert ocp: just use annotation which creates the cert in the mounted secret- kind/k8s: if set a caCerrPath with non-default value, use it, by install.sh, it creates the secret prometheus-web-tls and with tls.crt as the key inside. no need extract secret's tls.crt - add missing config for values-dev.yaml - deprecate "caCert" instead of new caCertPath or existingSecret + key Signed-off-by: Wen Zhou <wenzhou@redhat.com> update: cleanup - remove cert extraction for prom adaptor - remove creation of configmap - remove reference in OCP Signed-off-by: Wen Zhou <wenzhou@redhat.com> update: bump version for llm-d and all images + add support for podman - use env variable LLM_D_RELEASE to control all image in the deploy/install.sh - clone llm-d to local based on local version if match required release version - use env variable CONTAINER_TOOL to support podmano on fedora - remove/update *ignore files Signed-off-by: Wen Zhou <wenzhou@redhat.com> fix: k8s label cannot have "/" sanitize MODLE_ID Signed-off-by: Wen Zhou <wenzhou@redhat.com> fix: add timeout for prom adaptor + create secret in wva namespace from prom ns Signed-off-by: Wen Zhou <wenzhou@redhat.com> fix: github action e2e which uses "LLM_D_RELEASE:main" Signed-off-by: Wen Zhou <wenzhou@redhat.com> fix: API bump - old test was based on v1alph2 of GIE for infpool - new default is on v1 for infpool Signed-off-by: Wen Zhou <wenzhou@redhat.com> update: code review and go fmt Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
| elif [ "$POOL_GROUP" = "inference.networking.x-k8s.io" ]; then | ||
| POOL_VERSION="v1alpha2" | ||
| else | ||
| log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)" |
There was a problem hiding this comment.
log_error is used before deploy/lib/cli.sh is sourced (it’s sourced later in the script). If POOL_GROUP is invalid, this will fail with command not found rather than printing the intended message. Use a plain echo >&2 here, or move POOL_GROUP validation after the deploy libs are sourced so log_error is available.
| log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)" | |
| echo "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)" >&2 |
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | ||
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | ||
| kubectl apply -f - &> /dev/null | ||
| log_success "Secret copied to $WVA_NS namespace" |
There was a problem hiding this comment.
Copying a Secret by piping kubectl get ... -o yaml into kubectl apply typically fails because the YAML includes create-only / server-managed metadata (e.g., resourceVersion, uid, managedFields). This approach is likely to error when creating the Secret in the new namespace. Strip those metadata fields (or recreate the Secret from .data only) instead of applying the raw output.
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | |
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | |
| kubectl apply -f - &> /dev/null | |
| log_success "Secret copied to $WVA_NS namespace" | |
| local secret_type | |
| local secret_data | |
| secret_type="$(kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o jsonpath='{.type}')" || { | |
| log_error "Failed to read Secret type for $PROMETHEUS_SECRET_NAME from $MONITORING_NAMESPACE" | |
| return 1 | |
| } | |
| secret_data="$(kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o go-template='{{range $k, $v := .data}}{{printf "%s: %s\n" $k $v}}{{end}}')" || { | |
| log_error "Failed to read Secret data for $PROMETHEUS_SECRET_NAME from $MONITORING_NAMESPACE" | |
| return 1 | |
| } | |
| if { | |
| cat <<EOF | |
| apiVersion: v1 | |
| kind: Secret | |
| metadata: | |
| name: $PROMETHEUS_SECRET_NAME | |
| namespace: $WVA_NS | |
| type: $secret_type | |
| data: | |
| $(printf '%s\n' "$secret_data" | sed 's/^/ /') | |
| EOF | |
| } | kubectl apply -f - &> /dev/null; then | |
| log_success "Secret copied to $WVA_NS namespace" | |
| else | |
| log_error "Failed to copy Secret $PROMETHEUS_SECRET_NAME to $WVA_NS namespace" | |
| return 1 | |
| fi |
| # Copy prometheus TLS Secret to WVA namespace for direct mounting (no extraction needed) | ||
| log_info "Copying $PROMETHEUS_SECRET_NAME Secret to WVA namespace..." | ||
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | ||
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | ||
| kubectl apply -f - &> /dev/null | ||
| log_success "Secret copied to $WVA_NS namespace" | ||
|
|
There was a problem hiding this comment.
This copies the entire Prometheus TLS Secret into the WVA namespace. That Secret commonly contains tls.key (the Prometheus server private key), which WVA does not need and shouldn’t be duplicating across namespaces. Consider creating a dedicated Secret/ConfigMap containing only the CA/public cert (e.g., only tls.crt) and mount that instead.
| # Copy prometheus TLS Secret to WVA namespace for direct mounting (no extraction needed) | |
| log_info "Copying $PROMETHEUS_SECRET_NAME Secret to WVA namespace..." | |
| kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \ | |
| sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \ | |
| kubectl apply -f - &> /dev/null | |
| log_success "Secret copied to $WVA_NS namespace" | |
| # Create a dedicated Secret in the WVA namespace containing only the public certificate. | |
| log_info "Copying tls.crt from $PROMETHEUS_SECRET_NAME Secret to WVA namespace..." | |
| local prometheus_tls_crt_file | |
| prometheus_tls_crt_file="$(mktemp)" | |
| if ! kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o jsonpath='{.data.tls\.crt}' | base64 --decode > "$prometheus_tls_crt_file"; then | |
| rm -f "$prometheus_tls_crt_file" | |
| log_error "Failed to extract tls.crt from Secret $PROMETHEUS_SECRET_NAME in namespace $MONITORING_NAMESPACE" | |
| return 1 | |
| fi | |
| if [ ! -s "$prometheus_tls_crt_file" ]; then | |
| rm -f "$prometheus_tls_crt_file" | |
| log_error "Secret $PROMETHEUS_SECRET_NAME in namespace $MONITORING_NAMESPACE does not contain tls.crt" | |
| return 1 | |
| fi | |
| if ! kubectl create secret generic "$PROMETHEUS_SECRET_NAME" \ | |
| -n "$WVA_NS" \ | |
| --from-file=tls.crt="$prometheus_tls_crt_file" \ | |
| --dry-run=client -o yaml | kubectl apply -f - &> /dev/null; then | |
| rm -f "$prometheus_tls_crt_file" | |
| log_error "Failed to create reduced Secret $PROMETHEUS_SECRET_NAME in namespace $WVA_NS" | |
| return 1 | |
| fi | |
| rm -f "$prometheus_tls_crt_file" | |
| log_success "Secret with tls.crt copied to $WVA_NS namespace" |
| # TLS security configuration | ||
| tls: | ||
| insecureSkipVerify: true # Development: true, Production: false | ||
| # caCertPath: "/etc/ssl/certs/prometheus-ca.crt" # Only used when caCert is provided | ||
| # caCert: | # Uncomment and provide your CA certificate | ||
| # On OpenShift, service CA is auto-injected at this path | ||
| caCertPath: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt" | ||
| # Mount existing Secret for CA certificate (recommended) | ||
| # This eliminates the need for --set-file and certificate extraction | ||
| existingSecret: "" # Name of existing Secret (e.g., "prometheus-web-tls" for kind or k8s) | ||
| key: "tls.crt" # Name of the key in the Secret's data map containing the CA certificate (required when existingSecret is set) | ||
| # caCert: | # DEPRECATED: Use above existingSecret instead |
There was a problem hiding this comment.
The new TLS configuration introduces wva.prometheus.tls.caCertPath / existingSecret, but the chart still only writes PROMETHEUS_CA_CERT_PATH into the controller config when wva.prometheus.caCert is set (see templates/manager/wva-configmap.yaml). This means OpenShift (and any TLS-verify setup using only caCertPath/existingSecret) won’t configure the controller to use the mounted CA file. Update the config template to set PROMETHEUS_CA_CERT_PATH whenever wva.prometheus.tls.caCertPath is set (or whenever TLS verify is enabled), not only when caCert is provided.
Changes
caCerrPathwith non-default value byPROM_TLS_CA_CERT_PATH, use it,by install.sh, it creates the secret
prometheus-web-tlsand withtls.crtas the key inside.no need extract secret's tls.crt
caCert" instead of newcaCertPathorexistingSecret+key