You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Facing an issue accessing the monitoring port (or Prometheus metrics) in the Kubeflow Trainer Controller Manager running on the master branch. The instructions that worked in version v1.9.0 are not working in v2.0, and encountering a port-forwarding error.
Error:
$ kubectl port-forward -n kubeflow-system deployment/kubeflow-trainer-controller-manager 8080:8080
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
Handling connection for 8080
E0319 15:13:06.174141 3648479 portforward.go:424] "Unhandled Error" err="an error occurred forwarding 8080 -> 8080: error forwarding port 8080 to pod 71a61495b14b7ae8b610e860acd7bd7a0bd4beb7feb4bd66064ce245150ff339, uid : failed to execute portforward in network namespace \"/var/run/netns/cni-6a5c29c4-f90d-5175-3aeb-cdbca524d613\": failed to connect to localhost:8080 inside namespace \"71a61495b14b7ae8b610e860acd7bd7a0bd4beb7feb4bd66064ce245150ff339\", IPv4: dial tcp4 127.0.0.1:8080: connect: connection refused IPv6 dial tcp6 [::1]:8080: connect: connection refused "
error: lost connection to pod
What happened?
Facing an issue accessing the monitoring port (or Prometheus metrics) in the Kubeflow Trainer Controller Manager running on the master branch. The instructions that worked in version v1.9.0 are not working in v2.0, and encountering a port-forwarding error.
Error:
Setup Instrcutions Followed:
cluster setup:
env setup:
What did you expect to happen?
prometheus metrics should be accessible at localhost:8080/metrics
Environment
Kubernetes version:
Kubeflow Trainer version:
$ kubectl get pods -n kubeflow-system -l app.kubernetes.io/name=trainer -o jsonpath="{.items[*].spec.containers[*].image}" ghcr.io/kubeflow/trainer/trainer-controller-manager:latest
Kubeflow Python SDK version:
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered: