Skip to content

Operator that deploys and manages all the telemetry agents both in the podified control plane and in the external dataplane

License

Notifications You must be signed in to change notification settings

openstack-k8s-operators/telemetry-operator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

telemetry-operator

The Telemetry Operator handles the deployment of all the needed agents for gathering telemetry to assess the full state of a running Openstack cluster.

Description

This operator deploys a multiple telemetry agents, both in the control plane and in the dataplane nodes.

Dev setup

1.- Deploy crc:

cd install_yamls/devsetup
CPUS=12 MEMORY=25600 DISK=100 make crc

2.- Create edpm nodes

make crc_attach_default_interface

EDPM_TOTAL_NODES=2 make edpm_compute

3.- Deploy openstack-operator and openstack

cd ..
make crc_storage
make input

make openstack
make openstack_init
make openstack_deploy

4.- Deploy dataplane operator

DATAPLANE_TOTAL_NODES=2 DATAPLANE_NTP_SERVER=clock.redhat.com make edpm_deploy

To know when dataplane-operator finishes, you have to keep looking at "*-edpm" pods that keep appearing to run ansible on the compute nodes. They will appear one after the other. When those stop appearing, it is finished.

You can also make your process wait until everything finishes:

DATAPLANE_TOTAL_NODES=2 DATAPLANE_NTP_SERVER=clock.redhat.com make edpm_wait_deploy

5.- Refresh Nova discover hosts

make edpm_nova_discover_hosts

Now, we proceed to run our own telemetry-operator instance:

6.- Remove Telemetry deployment

oc patch openstackcontrolplane openstack-galera-network-isolation --type='json' -p='[{"op": "replace", "path": "/spec/telemetry/enabled", "value":false}]'

7.- Remove telemetry-operator from the deployments

oc project openstack-operators
oc delete csv telemetry-operator.v0.0.1

8.- Deploy custom telemetry-operator version

NOTE: If you intend to deploy a custom telemetry object with pre-populated image URLs, you can use make run instead of make run-with-webhook, because the webhooks will not be required.

cd telemetry-operator

oc delete -f config/crd/bases/
oc apply -f config/crd/bases/

make manifests generate
OPERATOR_TEMPLATES=$PWD/templates make run-with-webhook

9.- Deploy Telemetry

There are two options, either let openstack-operator manage a telemetry object, or disable openstack-operator and manage it yourself.

9a.- To continue running openstack-operator and use a telemetry object under its control, re-enable telemetry in the oscp:

oc patch openstackcontrolplane openstack-galera-network-isolation --type='json' -p='[{"op": "replace", "path": "/spec/telemetry/enabled", "value":true}]'

9b - To disable openstack-operator and use a custom telemetry object

oc scale deploy/openstack-operator-controller-manager --replicas=0 -n openstack-operators
oc apply -f config/samples/telemetry_v1beta1_telemetry.yaml

Run custom telemetry-operator bundle as part of openstack-operator

There are times where deploying a dev environment with the existing telemetry-operator and then replace it is not enough. For example, when changing the API its always good to be able to check if the OpenStackControlPlane can still be applied or there is something wrong.

For this, follow the procedure:

1.- Create three repositories for telemetry-operator in your quay.io personal account: telemetry-operator, telemetry-operator-bundle and telemetry-operator-index. The three of them must be public.

2.- Create three repositories for openstack-operator in your quay.io personal account: openstack-operator, openstack-operator-bundle and openstack-operator-index. The three of them must be public.

3.- Commit your changes in telemetry-operator and push the commit to the fork in your personal repository.

4.- Introduce a replace rule in openstack-operator/go.mod and openstack-operator/apis/go.mod like this:

replace github.com/openstack-k8s-operators/telemetry-operator/api => github.com/<github_user>/telemetry-operator/api <commit_id>

And run

make tidy

This would make the line look something like this:

replace github.com/openstack-k8s-operators/telemetry-operator/api => github.com/<github_user>/telemetry-operator/api v0.1.1-0.20240715084507-c8fd68f4cc2c

5.- Build telemetry-operator bundle, push it to your quay.io account and create a tag for the bundle with your commit id pointing to the version you have just built. This is used to find the exact sha of the image that is being to be used:

IMAGE_TAG_BASE=quay.io/<quay_user>/telemetry-operator VERSION=0.0.2 IMG=$IMAGE_TAG_BASE:v$VERSION make manifests build docker-build docker-push bundle bundle-build bundle-push catalog-build catalog-push

podman tag quay.io/<quay_user>/telemetry-operator-bundle:v0.0.2 quay.io/<quay_user>/telemetry-operator-bundle:<commit_id>
podman push quay.io/<quay_user>/telemetry-operator-bundle:<commit_id>

6.- Build openstack-operator bundle and push it to your quay.io account:

IMAGE_TAG_BASE=quay.io/<quay_user>/openstack-operator VERSION=0.0.2 IMG=$IMAGE_TAG_BASE:v$VERSION make manifests build docker-build docker-push bundle bundle-build bundle-push catalog-build catalog-push

7.- Deploy openstack with the recently build openstack-operator image and then deploy:

OPENSTACK_IMG=quay.io/<quay_user>/openstack-operator-index:v0.0.2 make openstack

make openstack_deploy

NOTE: This is if your quay.io account and your github account are identical. If you are using different names, you must use IMAGENAMESPACE=<quay_user> in the bundle building commands, both for telemetry and openstack, like this:

IMAGENAMESPACE=<quay_user> IMAGE_TAG_BASE=quay.io/<quay_user>/openstack-operator VERSION=0.0.2 IMG=$IMAGE_TAG_BASE:v$VERSION make manifests build docker-build docker-push bundle bundle-build bundle-push catalog-build catalog-push

Connect to Dataplane nodes

You can connect directly to the compute nodes using password 12345678:

Testing edpm-ansible changes

1.- Build your custom openstack-ansibleee-runner image using these steps and push it to a registry

2.- Override DATAPLANE_RUNNER_IMG and ANSIBLEEE_IMAGE_URL_DEFAULT when running edpm_deploy

cd ~/install_yamls/
DATAPLANE_RUNNER_IMG=<url_to_custom_image> ANSIBLEEE_IMAGE_URL_DEFAULT=<url_to_custom_image> make edpm_deploy

3.- During deployment dataplane-deployment-* pods would get spawned with the custom image.

Testing edpm-ansible changes using a volume mount backed by NFS server (tested)

The following procedure is to be performed after the

1.- Follow all these steps to add a PVC to OpenStackDataPlaneNodeSet’s CR.

2.- Deploy a debug pod with the PVC to it to verify whether your local repository was mounted into it

oc apply -f - <<EOF
---
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
  labels:
    app: debug
spec:
  containers:
    - name: debug-container
      image: busybox  # Replace with your preferred image for debugging
      command: ["/bin/sh", "-c", "sleep 3600"]  # Keeps the pod alive for an hour
      volumeMounts:
        - mountPath: /mnt
          name: pvc-volume
  volumes:
    - name: pvc-volume
      persistentVolumeClaim:
        claimName: edpm-ansible-dev  # Name of the PVC
  restartPolicy: Never  # Ensures the pod doesn't restart automatically
EOF

Verify whether repository exists

$ oc exec -it debug-pod sh -- ls /mnt
CHANGELOG.md               OWNERS_ALIASES             contribute                 molecule                   plugins                    tests
LICENSE                    README.md                  docs                       molecule-requirements.txt  requirements.yml           zuul.d
Makefile                   app-root                   galaxy.yml                 openstack_ansibleee        roles
OWNERS                     bindep.txt                 meta                       playbooks                  scripts

Delete the pod once verified

3.- Once the step to add extraMount to OpenStackDataPlaneNodeSet CR is executed, the deployment is reported as Deployment not started which is expected.

4.- Create a new EDPM deployment using the existing nodeSets

oc apply -f - <<EOF
apiVersion: dataplane.openstack.org/v1beta1
kind: OpenStackDataPlaneDeployment
metadata:
  name: edpm-deployment-debug
spec:
  nodeSets:
    - openstack-edpm-ipam
EOF

Deployment progresses using the edpm-ansible repository from the NFS mount

oc get osdpns
NAME                  STATUS   MESSAGE
openstack-edpm-ipam   False    Deployment in progress

Running kuttl tests locally

For the default suite, simply run make kuttl-test.

For standalone suites, you must follow these steps:

  1. Set up the testing namespace using install_yamls: cd install_yamls && make kuttl_common_prep heat heat_deploy NAMESPACE=telemetry-kuttl-tests
  2. (Optionally) Edit the list of suites to run: cd telemetry-operator && vi kuttl-test.yaml (Comment out any suites you don't need from the testDirs list)
  3. Run kuttl specifying that config file and namespace: kubectl-kuttl test --config ./kuttl-test.yaml --namespace telemetry-kuttl-tests

NOTE - (May 2024) These tests appear very reliable when running a single suite, but occasional flakiness (~ 25% failure) has been observed when they all run serially. This problem appears to be order/timing related and may or may not affect the automated CI.

Destroy the environment to start again

cd install_yamls/devsetup

# Delete the CRC node
make crc_cleanup

# Destroy edpm VMS
EDPM_TOTAL_NODES=2 make edpm_compute_cleanup

Emergency rescue access to CRC VM

If you need to connect directly to the CRC VM just use

ssh -i ~/.crc/machines/crc/id_ecdsa core@"192.168.130.11"

License

Copyright 2023.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

About

Operator that deploys and manages all the telemetry agents both in the podified control plane and in the external dataplane

Resources

License

Stars

Watchers

Forks

Packages

No packages published