Skip to content

update: chart for deployment on ocp, kind, k8s#854

Open
zdtsw wants to merge 2 commits intollm-d:mainfrom
zdtsw:chore_deploy
Open

update: chart for deployment on ocp, kind, k8s#854
zdtsw wants to merge 2 commits intollm-d:mainfrom
zdtsw:chore_deploy

Conversation

@zdtsw
Copy link
Copy Markdown
Collaborator

@zdtsw zdtsw commented Mar 6, 2026

Changes

  • update Makefile for ocp, we should not have prom stack or GW to be created on ocp
  • fix the order of prompt warning "if should create GW "
  • update helm template for handling tls cert
  1. ocp: just use annotation which creates the cert from the mounted secret
  2. kind/k8s: if set a caCerrPath with non-default value by PROM_TLS_CA_CERT_PATH, use it,
    by install.sh, it creates the secret prometheus-web-tls and with tls.crt as the key inside.
    no need extract secret's tls.crt
  • add missing config for values-dev.yaml
  • deprecate "caCert" instead of new caCertPath or existingSecret + key

Copilot AI review requested due to automatic review settings March 6, 2026 16:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates deployment tooling and Helm chart configuration to better support OpenShift vs. kind/vanilla Kubernetes differences (Prometheus/Gateway defaults and TLS CA handling).

Changes:

  • OpenShift: default to using the projected Service CA (service-ca.crt) and skip deploying Prometheus/Gateway control plane.
  • kind/k8s: switch WVA Prometheus TLS CA handling toward “mount existing Secret” rather than extracting certs to a file.
  • Adjust gateway installation prompt order in deploy/install.sh.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
deploy/openshift/install.sh Switches OpenShift CA approach to projected Service CA path; updates prerequisites messaging/tools.
deploy/openshift/README.md Updates OpenShift gateway text and adds troubleshooting section.
deploy/kubernetes/install.sh Adds PROM TLS secret/path variables and removes cert extraction.
deploy/kind-emulator/install.sh Adds PROM TLS secret/path variables and removes cert extraction; adjusts logging.
deploy/install.sh Stops --set-file CA injection; adds new TLS flags; reorders gateway prompt logic.
charts/workload-variant-autoscaler/values.yaml Updates TLS defaults to OpenShift service-ca path and adds existingSecret/key.
charts/workload-variant-autoscaler/values-dev.yaml Same TLS default changes for dev values.
charts/workload-variant-autoscaler/templates/manager/wva-deployment-controller-manager.yaml Adds conditional Secret/ConfigMap mounting logic for the CA cert file.
Makefile Forces OpenShift deploy target to skip Prometheus and gateway control plane.

Comment thread deploy/install.sh Outdated
Comment on lines +536 to +540
--set wva.prometheus.baseURL=$PROMETHEUS_URL \
--set wva.prometheus.monitoringNamespace=$MONITORING_NAMESPACE \
--set wva.prometheus.tls.caCertPath=$PROM_TLS_CA_CERT_PATH \
${PROM_TLS_SECRET_NAME:+--set wva.prometheus.tls.existingSecret=$PROM_TLS_SECRET_NAME} \
${PROM_TLS_KEY:+--set wva.prometheus.tls.key=$PROM_TLS_KEY} \
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deploy_prometheus_adapter() still requires PROM_CA_CERT_PATH to exist (it creates a prometheus-ca ConfigMap from it), but this PR removes the --set-file wva.prometheus.caCert=$PROM_CA_CERT_PATH path and the environment scripts no longer populate PROM_CA_CERT_PATH. As a result, installs will fail at the adapter step (default DEPLOY_PROMETHEUS_ADAPTER=true) unless you either (a) keep generating PROM_CA_CERT_PATH in deploy_wva_prerequisites(), or (b) update the adapter deployment to source the CA from the new TLS secret/path instead of PROM_CA_CERT_PATH.

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh Outdated
Comment on lines +538 to +540
--set wva.prometheus.tls.caCertPath=$PROM_TLS_CA_CERT_PATH \
${PROM_TLS_SECRET_NAME:+--set wva.prometheus.tls.existingSecret=$PROM_TLS_SECRET_NAME} \
${PROM_TLS_KEY:+--set wva.prometheus.tls.key=$PROM_TLS_KEY} \
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Helm flags set wva.prometheus.tls.existingSecret=$PROM_TLS_SECRET_NAME, but the Prometheus TLS secret is created in $MONITORING_NAMESPACE (see env-specific scripts) while the controller runs in $WVA_NS. Secrets are namespace-scoped, so the controller pod won’t be able to mount this secret unless it is also created/copied into $WVA_NS (or WVA is deployed into the same namespace).

Copilot uses AI. Check for mistakes.
Comment thread deploy/kubernetes/install.sh Outdated
Comment on lines 32 to 36
@@ -28,10 +33,6 @@ check_specific_prerequisites() {
deploy_wva_prerequisites() {
log_info "Deploying Workload-Variant-Autoscaler prerequisites for Kubernetes..."

# Extract Prometheus CA certificate
log_info "Extracting Prometheus TLS certificate"
kubectl get secret $PROMETHEUS_SECRET_NAME -n $MONITORING_NAMESPACE -o jsonpath='{.data.tls\.crt}' | base64 -d > $PROM_CA_CERT_PATH

if [ "$SKIP_TLS_VERIFY" = "true" ]; then
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Prometheus TLS secret is created in $MONITORING_NAMESPACE (see deploy_prometheus_stack()), but the chart mounts .Values.wva.prometheus.tls.existingSecret from the controller’s namespace ($WVA_NS). Because secrets can’t be mounted cross-namespace, this will break TLS CA mounting unless you also create/copy the secret into $WVA_NS (or switch back to a ConfigMap/inline caCert approach).

Copilot uses AI. Check for mistakes.
Comment thread deploy/kubernetes/install.sh Outdated
# Prometheus TLS - mount existing secret directly (no extraction needed)
PROM_TLS_SECRET_NAME="prometheus-web-tls"
PROM_TLS_KEY="tls.crt"
PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: differntdifferent.

Suggested change
PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value
PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a different path than OCP default value

Copilot uses AI. Check for mistakes.
Comment thread deploy/kind-emulator/install.sh Outdated
# Prometheus TLS - mount existing secret directly (no extraction needed)
PROM_TLS_SECRET_NAME="prometheus-web-tls"
PROM_TLS_KEY="tls.crt"
PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: differntdifferent.

Suggested change
PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value
PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a different path than OCP default value

Copilot uses AI. Check for mistakes.
Comment on lines 29 to +36
insecureSkipVerify: true # Development: true, Production: false
caCertPath: "/etc/ssl/certs/prometheus-ca.crt"
# caCert: | # Uncomment and provide your CA certificate
# On OpenShift, service CA is auto-injected at this path
caCertPath: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
# Mount existing Secret for CA certificate (recommended)
# This eliminates the need for --set-file and certificate extraction
existingSecret: "" # Name of existing Secret (e.g., "prometheus-web-tls")
key: "tls.crt" # Name of the key in the Secret's data map containing the CA certificate (required when existingSecret is set)
# caCert: | # DEPRECATED: Use above existingSecret instead
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as values.yaml: values-dev.yaml now defaults caCertPath to the OpenShift-only service-ca.crt path. In dev/kind/k8s this file typically won’t exist, and the controller will still try to read/parse it even when insecureSkipVerify=true, causing startup failures. Consider using a non-OpenShift default here (or setting caCertPath: "" for dev when skipping verification).

Copilot uses AI. Check for mistakes.
PROMETHEUS_URL=${PROMETHEUS_URL:-"$PROMETHEUS_BASE_URL:$PROMETHEUS_PORT"}
PROMETHEUS_SECRET_NAME=${PROMETHEUS_SECRET_NAME:-"prometheus-web-tls"}
# Prometheus TLS - mount existing secret directly (no extraction needed)
PROM_TLS_SECRET_NAME="prometheus-web-tls"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROM_TLS_SECRET_NAME is hard-coded to prometheus-web-tls, but the secret actually created by this script uses $PROMETHEUS_SECRET_NAME (which is overrideable). If a user overrides PROMETHEUS_SECRET_NAME, WVA will try to mount a different (nonexistent) secret. Consider deriving PROM_TLS_SECRET_NAME from PROMETHEUS_SECRET_NAME (or honoring an override) to keep these consistent.

Suggested change
PROM_TLS_SECRET_NAME="prometheus-web-tls"
PROM_TLS_SECRET_NAME=${PROM_TLS_SECRET_NAME:-$PROMETHEUS_SECRET_NAME}

Copilot uses AI. Check for mistakes.
Comment on lines 63 to +65
PROMETHEUS_SECRET_NAME="prometheus-web-tls"
# Prometheus TLS - mount existing secret directly (no extraction needed)
PROM_TLS_SECRET_NAME="prometheus-web-tls"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROM_TLS_SECRET_NAME is hard-coded to prometheus-web-tls, but the script creates the Prometheus TLS secret using $PROMETHEUS_SECRET_NAME. If $PROMETHEUS_SECRET_NAME changes, WVA will attempt to mount the wrong secret. It’s safer to set PROM_TLS_SECRET_NAME from $PROMETHEUS_SECRET_NAME (or support overriding PROM_TLS_SECRET_NAME).

Suggested change
PROMETHEUS_SECRET_NAME="prometheus-web-tls"
# Prometheus TLS - mount existing secret directly (no extraction needed)
PROM_TLS_SECRET_NAME="prometheus-web-tls"
PROMETHEUS_SECRET_NAME=${PROMETHEUS_SECRET_NAME:-"prometheus-web-tls"}
# Prometheus TLS - mount existing secret directly (no extraction needed)
PROM_TLS_SECRET_NAME=${PROM_TLS_SECRET_NAME:-$PROMETHEUS_SECRET_NAME}

Copilot uses AI. Check for mistakes.
Comment on lines +354 to +357
If the operator is running but ConfigMap creation still fails, manually verify the service-ca.crt annotation:

```bash
oc get configmap openshift-service-ca.crt -n workload-variant-autoscaler-system -o yaml
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new troubleshooting section refers to "Service CA ConfigMap creation" failures, but the OpenShift install script no longer creates a Service CA ConfigMap (it relies on the projected service-ca.crt instead). The steps/error message here should be updated to match the new behavior (e.g., troubleshooting missing projected service-ca.crt in the controller pod, or adapter CA configuration).

Suggested change
If the operator is running but ConfigMap creation still fails, manually verify the service-ca.crt annotation:
```bash
oc get configmap openshift-service-ca.crt -n workload-variant-autoscaler-system -o yaml
If the operator is running but the controller still fails with Service CA or TLS errors, verify that the projected `service-ca.crt` is present in the controller pod and that it is being used:
```bash
# Find the controller manager pod
oc get pods -n workload-variant-autoscaler-system
# Check for the projected service-ca certificate inside the pod
CONTROLLER_POD=$(oc get pods -n workload-variant-autoscaler-system -l control-plane=controller-manager -o jsonpath='{.items[0].metadata.name}')
oc exec -n workload-variant-autoscaler-system "$CONTROLLER_POD" -- ls -l /var/run/secrets
oc exec -n workload-variant-autoscaler-system "$CONTROLLER_POD" -- find /var/run/secrets -name 'service-ca.crt' -print
# If external metrics or adapter connections are failing, also verify the Prometheus Adapter CA configuration
oc get configmap -n openshift-user-workload-monitoring -l app=prometheus-adapter -o yaml

Copilot uses AI. Check for mistakes.
Comment on lines +104 to +113
{{- if ne .Values.wva.prometheus.tls.caCertPath "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt" }}
- name: prometheus-ca-cert
mountPath: /etc/ssl/certs/prometheus-ca.crt
mountPath: {{ .Values.wva.prometheus.tls.caCertPath }}
{{- if .Values.wva.prometheus.tls.existingSecret }}
subPath: {{ .Values.wva.prometheus.tls.key }}
{{- else }}
subPath: ca.crt
{{- end }}
readOnly: true
{{- end }}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CA volume/volumeMount is skipped entirely when caCertPath equals /var/run/secrets/.../service-ca.crt, even if wva.prometheus.tls.existingSecret is set. This makes existingSecret silently ignored whenever a user forgets to also change caCertPath, and it also hard-codes OpenShift detection via a magic path string. Consider keying the conditional on whether you actually need to mount something (e.g., existingSecret set or inline caCert provided) rather than on a path equality check, and enforce that tls.key is non-empty when existingSecret is used.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 7, 2026 19:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 13 comments.

Comment thread deploy/kubernetes/install.sh Outdated
Comment on lines +46 to +47
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying a Secret by piping kubectl get ... -o yaml through sed will typically include cluster-generated fields (e.g., resourceVersion, uid, managedFields) that are invalid on create/apply in the target namespace, so this can fail even when the source Secret exists. Prefer recreating the Secret via kubectl get ... -o json and deleting server-managed metadata before applying, or use kubectl create secret ... --dry-run=client to rebuild it.

Suggested change
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o json | \
jq --arg ns "$WVA_NS" '
.metadata.namespace = $ns
| del(
.metadata.uid,
.metadata.resourceVersion,
.metadata.creationTimestamp,
.metadata.managedFields,
.metadata.selfLink,
.metadata.annotations["kubectl.kubernetes.io/last-applied-configuration"]
)
' | \

Copilot uses AI. Check for mistakes.
mountPath: /etc/ssl/certs/prometheus-ca.crt
mountPath: {{ .Values.wva.prometheus.tls.caCertPath }}
{{- if .Values.wva.prometheus.tls.existingSecret }}
subPath: {{ .Values.wva.prometheus.tls.key }}
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When wva.prometheus.tls.existingSecret is set, the template unconditionally renders subPath: {{ .Values.wva.prometheus.tls.key }}. If a user sets existingSecret but forgets key (or sets it to empty), Helm will render an invalid manifest/mount. Consider enforcing this with Helm's required function (or conditionally omitting the volumeMount when key is empty) so misconfiguration fails at template time.

Suggested change
subPath: {{ .Values.wva.prometheus.tls.key }}
subPath: {{ required "wva.prometheus.tls.key is required when wva.prometheus.tls.existingSecret is set" .Values.wva.prometheus.tls.key }}

Copilot uses AI. Check for mistakes.
Comment on lines +335 to +358
### Service CA ConfigMap Creation Failed

```bash
[ERROR] ConfigMap creation failed - verify namespace permissions and OpenShift Service CA operator is running
```

**Solution**: Verify the Service CA operator is working:

```bash
# Check Service CA operator status
oc get pods -n openshift-service-ca-operator

# Verify namespace exists
oc get namespace workload-variant-autoscaler-system

# Check for RBAC issues
oc auth can-i create configmaps -n workload-variant-autoscaler-system
```

If the operator is running but ConfigMap creation still fails, manually verify the service-ca.crt annotation:

```bash
oc get configmap openshift-service-ca.crt -n workload-variant-autoscaler-system -o yaml
```
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new troubleshooting section refers to "Service CA ConfigMap creation" failures, but the OpenShift install flow no longer creates a Service CA ConfigMap and instead relies on the projected service-ca.crt volume. The README should be updated to match the new behavior (e.g., troubleshoot missing projected cert / SA permissions), or this section should be removed to avoid misleading users.

Copilot uses AI. Check for mistakes.
Comment thread deploy/kubernetes/install.sh Outdated
Comment on lines +20 to +24
PROMETHEUS_SECRET_NAME=${PROMETHEUS_SECRET_NAME:-"prometheus-web-tls"}
# Prometheus TLS - mount existing secret directly (no extraction needed)
PROM_TLS_SECRET_NAME="prometheus-web-tls"
PROM_TLS_KEY="tls.crt"
PROM_TLS_CA_CERT_PATH="/etc/ssl/certs/prometheus-ca.crt" # need a differnt path than OCP default value
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROM_TLS_SECRET_NAME is hardcoded to "prometheus-web-tls" even though PROMETHEUS_SECRET_NAME is configurable. If a user overrides PROMETHEUS_SECRET_NAME, the script will copy that Secret but Helm will still mount "prometheus-web-tls", leading to a missing Secret at runtime. Consider setting PROM_TLS_SECRET_NAME from PROMETHEUS_SECRET_NAME (or making it configurable in the same way).

Copilot uses AI. Check for mistakes.
if ! docker image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
if ! $CONTAINER_TOOL image inspect "$WVA_IMAGE_REPO:$WVA_IMAGE_TAG" >/dev/null 2>&1; then
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message suggests running make $CONTAINER_TOOL-build, but the Makefile target is always docker-build (it already uses $(CONTAINER_TOOL) internally). With CONTAINER_TOOL=podman, this will point users to a non-existent make target. Update the message to reference make docker-build CONTAINER_TOOL=$CONTAINER_TOOL ... (or similar).

Suggested change
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make $CONTAINER_TOOL-build IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"
log_error "Image '$WVA_IMAGE_REPO:$WVA_IMAGE_TAG' not found locally - Please build the image first (e.g., 'make docker-build CONTAINER_TOOL=$CONTAINER_TOOL IMG=$WVA_IMAGE_REPO:$WVA_IMAGE_TAG')"

Copilot uses AI. Check for mistakes.
configMap:
name: prometheus-ca
secret:
secretName: prometheus-web-tls # Secret containing Prom TLS cert
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This values file hardcodes secretName: prometheus-web-tls, but deploy/install.sh allows overriding the Prometheus TLS secret name (PROMETHEUS_SECRET_NAME). If a user overrides it, Prometheus Adapter will still try to mount prometheus-web-tls and fail. Consider either documenting that the secret name must stay prometheus-web-tls, or wiring the secret name into the Helm install via --set so it stays consistent with PROMETHEUS_SECRET_NAME.

Suggested change
secretName: prometheus-web-tls # Secret containing Prom TLS cert
secretName: prometheus-web-tls # Secret containing Prom TLS cert; must match the Prometheus TLS Secret name (PROMETHEUS_SECRET_NAME in deploy/install.sh)

Copilot uses AI. Check for mistakes.
Comment thread deploy/kubernetes/install.sh Outdated
Comment on lines +44 to +49
# Copy prometheus-web-tls Secret to WVA namespace for mounting
log_info "Copying prometheus-web-tls Secret to WVA namespace..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying the prometheus-web-tls secret from the monitoring namespace into the WVA namespace via kubectl get secret ... -o yaml | sed ... | kubectl apply -f - duplicates the full TLS secret, including the private key, into a less-privileged namespace. Any compromise of workloads or serviceaccounts in WVA_NS that can read secrets would then allow an attacker to obtain Prometheus' TLS private key and impersonate the Prometheus endpoint or decrypt traffic. Instead, create a dedicated secret or ConfigMap that only contains the certificate/CA material required for verification and avoid propagating the private key into application namespaces.

Suggested change
# Copy prometheus-web-tls Secret to WVA namespace for mounting
log_info "Copying prometheus-web-tls Secret to WVA namespace..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"
# Create Prometheus TLS certificate Secret in WVA namespace without private key
log_info "Creating Prometheus TLS certificate Secret in WVA namespace (without private key)..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o json | \
jq --arg ns "$WVA_NS" --arg name "$PROM_TLS_SECRET_NAME" --arg key "$PROM_TLS_KEY" '
. as $orig
| {
apiVersion: "v1",
kind: "Secret",
metadata: {
name: $name,
namespace: $ns
},
type: "Opaque",
data: {
($key): ( $orig.data["ca.crt"] // $orig.data["tls.crt"] )
}
}' | \
kubectl apply -f - &> /dev/null
log_success "Prometheus TLS certificate Secret prepared in $WVA_NS namespace"

Copilot uses AI. Check for mistakes.
Comment thread deploy/kind-emulator/install.sh Outdated
Comment on lines +304 to +309
# Copy prometheus-web-tls Secret to WVA namespace for mounting
log_info "Copying prometheus-web-tls Secret to WVA namespace..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even in the emulated Kind environment, copying the prometheus-web-tls secret verbatim into the WVA namespace replicates Prometheus' TLS private key into an additional namespace. If any pod or serviceaccount in WVA_NS is compromised, an attacker could read this secret and impersonate Prometheus or decrypt its TLS traffic, which is unnecessary since the controller only needs certificate/CA material for validation. Limit what is copied to just the certificate/CA data (no private key) or create a separate secret that omits tls.key for use by WVA.

Suggested change
# Copy prometheus-web-tls Secret to WVA namespace for mounting
log_info "Copying prometheus-web-tls Secret to WVA namespace..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"
# Create a Prometheus TLS validation secret in the WVA namespace without the private key
log_info "Creating prometheus-web-tls validation Secret in WVA namespace (without private key)..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o json | \
jq '
del(.data["tls.key"])
| .metadata.namespace = env.WVA_NS
| .metadata.resourceVersion = null
| .metadata.uid = null
| .metadata.creationTimestamp = null
' | \
kubectl apply -n "$WVA_NS" -f - &> /dev/null
log_success "Validation Secret created in $WVA_NS namespace without private key"

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh Outdated

if [ ! -d "$LLM_D_PROJECT" ]; then
log_info "Cloning $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
git clone -b $LLM_D_RELEASE -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git $LLM_D_PROJECT &> /dev/null
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script automatically clones and trusts the external GitHub repository $LLM_D_OWNER/$LLM_D_PROJECT at the mutable tag $LLM_D_RELEASE, then later uses its manifests (for example via helmfile) to configure the cluster. If the upstream repo or the LLM_D_RELEASE tag is compromised or retagged, running this script could pull and apply malicious code to your cluster with the privileges of the current kube context. To reduce supply-chain risk, pin to an immutable commit SHA (or vendored content) and/or verify signatures or checksums before using external assets.

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh Outdated
Comment on lines +1106 to +1111
if ! helm upgrade -i prometheus-adapter prometheus-community/prometheus-adapter \
-n $MONITORING_NAMESPACE \
-f "$values_file" \
--set prometheus.url="$PROMETHEUS_BASE_URL" \
--set prometheus.port="$PROMETHEUS_PORT" \
--timeout=3m \
--timeout=$timeout \
Copy link

Copilot AI Mar 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installing Prometheus Adapter via helm upgrade -i prometheus-adapter prometheus-community/prometheus-adapter after adding the prometheus-community Helm repo pulls and executes a chart from an external repository without a pinned version or integrity verification. If the prometheus-community repo is compromised or the chart is updated with malicious templates, rerunning this script could silently deploy attacker-controlled workloads into your monitoring namespace. Prefer pinning a specific chart version (or OCI digest) and, where possible, verifying provenance/signatures or vendoring the chart artifacts.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +66
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra line (probably because of go fmt). Unrelated to the PR.

Copilot AI review requested due to automatic review settings March 23, 2026 12:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 23 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

deploy/kind-emulator/install.sh:300

  • SKIP_TLS_VERIFY is set as a string elsewhere (e.g., "true"/"false"). This comparison uses = true (no quotes), which can misbehave if the variable contains "true" as a string. Match the Kubernetes install script and compare against the quoted literal (\"true\").
    if [ "$SKIP_TLS_VERIFY" = true ] ||  [ -f "$WVA_PROJECT/charts/workload-variant-autoscaler/values-dev.yaml" ]; then
        log_warning "TLS verification NOT enabled: using values-dev.yaml for dev deployments"
        VALUES_FILE="${WVA_PROJECT}/charts/workload-variant-autoscaler/values-dev.yaml"
    else

Comment thread deploy/install.sh Outdated
Comment on lines +871 to +875
if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
return 1
fi
log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git clone ... | grep -v ... pipeline makes the if condition depend on grep’s exit status (not git clone’s). If clone succeeds but its output is fully filtered, grep returns 1 and this path incorrectly reports a clone failure. Fix by checking git clone’s exit code directly and redirecting/suppressing stdout/stderr separately (or preserve the clone exit status with PIPESTATUS).

Suggested change
if ! git clone -b "$LLM_D_RELEASE" -- https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git "$LLM_D_PROJECT" 2>&1 | grep -v "Cloning into"; then
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
return 1
fi
log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
if git_output=$(git clone -b "$LLM_D_RELEASE" -- "https://github.com/$LLM_D_OWNER/$LLM_D_PROJECT.git" "$LLM_D_PROJECT" 2>&1); then
printf '%s\n' "$git_output" | grep -v "Cloning into" || true
log_success "Successfully cloned $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
else
printf '%s\n' "$git_output" | grep -v "Cloning into" || true
log_error "Failed to clone $LLM_D_PROJECT repository (release: $LLM_D_RELEASE)"
return 1
fi

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh
Comment on lines +56 to 58
exit 1
fi

Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_error (per shared context) already exits the script. The explicit exit 1 is unreachable and should be removed to avoid confusion about control flow.

Suggested change
exit 1
fi
fi

Copilot uses AI. Check for mistakes.

- Inappropriate SLOs for the deployed model and GPU types: update the `SLO_TPOT` and `SLO_TTFT` variables with appropriate SLOs given the model and employed GPU type

### Service CA ConfigMap Creation Failed
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This troubleshooting section refers to Service CA ConfigMap creation failing, but the updated OpenShift install flow no longer creates a ConfigMap (it relies on the projected service CA file). This section should be updated/removed, or rewritten to troubleshoot projected service CA availability/mounting instead.

Copilot uses AI. Check for mistakes.
Comment thread deploy/install.sh Outdated

# Create InferenceModel for second model (maps model name to pool)
# Note: InferenceModel CRD may not be available in all environments
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the typo 'InfereceModelRewrite' to 'InferenceModelRewrite' in the TODO comment.

Suggested change
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InfereceModelRewrite instead
# TODO: InferenceModel only exists in inference.networking.x-k8s.io/v1alpha2, should use InferenceModelRewrite instead

Copilot uses AI. Check for mistakes.
)

// Default to v1alpha2 group if empty
// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is grammatically unclear and could be misread. Consider rewriting to a concise statement like: 'Default to v1 (llm-d defaults to v1 rather than v1alpha2) if empty' to improve maintainability.

Suggested change
// Default to v1 (since llm-d already by default use v1 than v1alpha2) if empty
// Default to v1 (llm-d defaults to v1 rather than v1alpha2) if poolGroup is empty

Copilot uses AI. Check for mistakes.
- update on Makefile for ocp, we should not have prom stack or GW
  created on ocp
- fix the order of prompt warning if should create GW
- update helm template for handling tls cert
  ocp: just use annotation which creates the cert in the mounted secret-
  kind/k8s: if set a caCerrPath with non-default value, use it, by
  install.sh, it creates the secret prometheus-web-tls and with tls.crt
  as the key inside. no need extract secret's tls.crt
- add missing config for values-dev.yaml
- deprecate "caCert" instead of new caCertPath or existingSecret + key

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

update: cleanup

- remove cert extraction for prom adaptor
- remove creation of configmap
- remove reference in OCP

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

update: bump version for llm-d and all images + add support for podman

- use env variable LLM_D_RELEASE to control all image in the
  deploy/install.sh
- clone llm-d to local based on local version if match required release
  version
- use env variable CONTAINER_TOOL to support podmano on fedora
- remove/update *ignore files

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: k8s label cannot have "/" sanitize MODLE_ID

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: add timeout for prom adaptor + create secret in wva namespace from
prom ns

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: github action e2e which uses "LLM_D_RELEASE:main"

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

fix: API bump

- old test was based on v1alph2 of GIE for infpool
- new default is on v1 for infpool

Signed-off-by: Wen Zhou <wenzhou@redhat.com>

update: code review and go fmt

Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Copilot AI review requested due to automatic review settings April 7, 2026 15:22
Signed-off-by: Wen Zhou <wenzhou@redhat.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 24 changed files in this pull request and generated 4 comments.

Comment thread deploy/install.sh
elif [ "$POOL_GROUP" = "inference.networking.x-k8s.io" ]; then
POOL_VERSION="v1alpha2"
else
log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log_error is used before deploy/lib/cli.sh is sourced (it’s sourced later in the script). If POOL_GROUP is invalid, this will fail with command not found rather than printing the intended message. Use a plain echo >&2 here, or move POOL_GROUP validation after the deploy libs are sourced so log_error is available.

Suggested change
log_error "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)"
echo "Unknown POOL_GROUP: $POOL_GROUP (expected inference.networking.k8s.io or inference.networking.x-k8s.io)" >&2

Copilot uses AI. Check for mistakes.
Comment thread deploy/lib/infra_wva.sh
Comment on lines +179 to +182
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying a Secret by piping kubectl get ... -o yaml into kubectl apply typically fails because the YAML includes create-only / server-managed metadata (e.g., resourceVersion, uid, managedFields). This approach is likely to error when creating the Secret in the new namespace. Strip those metadata fields (or recreate the Secret from .data only) instead of applying the raw output.

Suggested change
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"
local secret_type
local secret_data
secret_type="$(kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o jsonpath='{.type}')" || {
log_error "Failed to read Secret type for $PROMETHEUS_SECRET_NAME from $MONITORING_NAMESPACE"
return 1
}
secret_data="$(kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o go-template='{{range $k, $v := .data}}{{printf "%s: %s\n" $k $v}}{{end}}')" || {
log_error "Failed to read Secret data for $PROMETHEUS_SECRET_NAME from $MONITORING_NAMESPACE"
return 1
}
if {
cat <<EOF
apiVersion: v1
kind: Secret
metadata:
name: $PROMETHEUS_SECRET_NAME
namespace: $WVA_NS
type: $secret_type
data:
$(printf '%s\n' "$secret_data" | sed 's/^/ /')
EOF
} | kubectl apply -f - &> /dev/null; then
log_success "Secret copied to $WVA_NS namespace"
else
log_error "Failed to copy Secret $PROMETHEUS_SECRET_NAME to $WVA_NS namespace"
return 1
fi

Copilot uses AI. Check for mistakes.
Comment thread deploy/lib/infra_wva.sh
Comment on lines +177 to 183
# Copy prometheus TLS Secret to WVA namespace for direct mounting (no extraction needed)
log_info "Copying $PROMETHEUS_SECRET_NAME Secret to WVA namespace..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This copies the entire Prometheus TLS Secret into the WVA namespace. That Secret commonly contains tls.key (the Prometheus server private key), which WVA does not need and shouldn’t be duplicating across namespaces. Consider creating a dedicated Secret/ConfigMap containing only the CA/public cert (e.g., only tls.crt) and mount that instead.

Suggested change
# Copy prometheus TLS Secret to WVA namespace for direct mounting (no extraction needed)
log_info "Copying $PROMETHEUS_SECRET_NAME Secret to WVA namespace..."
kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o yaml | \
sed "s/namespace: $MONITORING_NAMESPACE/namespace: $WVA_NS/" | \
kubectl apply -f - &> /dev/null
log_success "Secret copied to $WVA_NS namespace"
# Create a dedicated Secret in the WVA namespace containing only the public certificate.
log_info "Copying tls.crt from $PROMETHEUS_SECRET_NAME Secret to WVA namespace..."
local prometheus_tls_crt_file
prometheus_tls_crt_file="$(mktemp)"
if ! kubectl get secret "$PROMETHEUS_SECRET_NAME" -n "$MONITORING_NAMESPACE" -o jsonpath='{.data.tls\.crt}' | base64 --decode > "$prometheus_tls_crt_file"; then
rm -f "$prometheus_tls_crt_file"
log_error "Failed to extract tls.crt from Secret $PROMETHEUS_SECRET_NAME in namespace $MONITORING_NAMESPACE"
return 1
fi
if [ ! -s "$prometheus_tls_crt_file" ]; then
rm -f "$prometheus_tls_crt_file"
log_error "Secret $PROMETHEUS_SECRET_NAME in namespace $MONITORING_NAMESPACE does not contain tls.crt"
return 1
fi
if ! kubectl create secret generic "$PROMETHEUS_SECRET_NAME" \
-n "$WVA_NS" \
--from-file=tls.crt="$prometheus_tls_crt_file" \
--dry-run=client -o yaml | kubectl apply -f - &> /dev/null; then
rm -f "$prometheus_tls_crt_file"
log_error "Failed to create reduced Secret $PROMETHEUS_SECRET_NAME in namespace $WVA_NS"
return 1
fi
rm -f "$prometheus_tls_crt_file"
log_success "Secret with tls.crt copied to $WVA_NS namespace"

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +49
# TLS security configuration
tls:
insecureSkipVerify: true # Development: true, Production: false
# caCertPath: "/etc/ssl/certs/prometheus-ca.crt" # Only used when caCert is provided
# caCert: | # Uncomment and provide your CA certificate
# On OpenShift, service CA is auto-injected at this path
caCertPath: "/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt"
# Mount existing Secret for CA certificate (recommended)
# This eliminates the need for --set-file and certificate extraction
existingSecret: "" # Name of existing Secret (e.g., "prometheus-web-tls" for kind or k8s)
key: "tls.crt" # Name of the key in the Secret's data map containing the CA certificate (required when existingSecret is set)
# caCert: | # DEPRECATED: Use above existingSecret instead
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new TLS configuration introduces wva.prometheus.tls.caCertPath / existingSecret, but the chart still only writes PROMETHEUS_CA_CERT_PATH into the controller config when wva.prometheus.caCert is set (see templates/manager/wva-configmap.yaml). This means OpenShift (and any TLS-verify setup using only caCertPath/existingSecret) won’t configure the controller to use the mounted CA file. Update the config template to set PROMETHEUS_CA_CERT_PATH whenever wva.prometheus.tls.caCertPath is set (or whenever TLS verify is enabled), not only when caCert is provided.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants