Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prombench: Added support for --bench.version and --bench.directory flags. #812

Merged
merged 9 commits into from
Dec 19, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 55 additions & 16 deletions prombench/Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,5 @@
INFRA_CMD ?= ../infra/infra

PROVIDER ?= gke

.PHONY: deploy clean
deploy: node_create resource_apply
# GCP sometimes takes longer than 30 tries when trying to delete nodes
# if k8s resources are not already cleared
clean: resource_delete node_delete
INFRA_CMD ?= ../infra/infra
PROVIDER ?= gke

cluster_create:
${INFRA_CMD} ${PROVIDER} cluster create -a ${AUTH_FILE} \
Expand Down Expand Up @@ -37,50 +30,96 @@ cluster_delete:
-v CLUSTER_NAME:${CLUSTER_NAME} -v PR_NUMBER:${PR_NUMBER} \
-f manifests/cluster_${PROVIDER}.yaml

# /prombench <...> --bench.directory
BENCHMARK_DIRECTORY := $(if $(BENCHMARK_DIRECTORY),$(BENCHMARK_DIRECTORY),manifests/prombench)
# /prombench <...> --bench.version
BENCHMARK_VERSION := $(if $(BENCHMARK_VERSION),$(BENCHMARK_VERSION),master)
PROMBENCH_GIT_REPOSITORY ?= [email protected]:prometheus/test-infra.git
PROMBENCH_DIR ?= .

# maybe_pull_custom_version allows custom benchmarking as designed in
# https://github.com/prometheus/proposals/pull/41. It allows calling
# /prombench <release> --bench.version=<@commit or branch> which will cause
# prombench GH job on Prometheus repo to call infra CLI with the non-master BENCHMARK_VERSION.
# In such a case we pull a prombench repository for the given branch or commit version
# and adjust PROMBENCH_DIR. As a result `make deploy` and `make clean` jobs
# will apply /manifests/ apply custom manifests or even node pools.
.PHONE: maybe_pull_custom_version
maybe_pull_custom_version:
bwplotka marked this conversation as resolved.
Show resolved Hide resolved
ifeq (${BENCHMARK_VERSION},master)
@echo ">> Using standard benchmark configuration, from the docker image"
else
@echo ">> Git pulling custom benchmark configuration from the ${BENCHMARK_VERSION}"
@$(eval $@_TMP_DIR=$(shell mktemp -d -t "prombench"))
bwplotka marked this conversation as resolved.
Show resolved Hide resolved
cd ${$@_TMP_DIR} && git clone ${PROMBENCH_GIT_REPOSITORY}
ifeq ($(subst @,,${BENCHMARK_VERSION}),${BENCHMARK_VERSION})
@echo ">> --bench.version is a branch, reseting to origin/${BENCHMARK_VERSION}"
bwplotka marked this conversation as resolved.
Show resolved Hide resolved
cd ${$@_TMP_DIR}/test-infra && git reset --hard origin/${BENCHMARK_VERSION}
else
@echo ">> --bench.version is a commit SHA, reseting to $(subst @,,${BENCHMARK_VERSION})"
cd ${$@_TMP_DIR}/test-infra && git reset --hard $(subst @,,${BENCHMARK_VERSION})
endif
$(eval PROMBENCH_DIR=${$@_TMP_DIR}/test-infra/prombench)
endif
@echo ">> Using following files in ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}"
@ls -lR ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}

.PHONE: clean_tmp_dir
clean_tmp_dir: # Clean after maybe_pull_custom_version
[ -z ${maybe_pull_custom_version_TMP_DIR} ] || rm -rf ${maybe_pull_custom_version_TMP_DIR}

.PHONY: deploy
deploy: maybe_pull_custom_version node_create resource_apply clean_tmp_dir

.PHONE: clean
# GCP sometimes takes longer than 30 tries when trying to delete nodes
# if k8s resources are not already cleared
clean: maybe_pull_custom_version resource_delete node_delete clean_tmp_dir

node_create:
${INFRA_CMD} ${PROVIDER} nodes create -a ${AUTH_FILE} \
-v ZONE:${ZONE} -v GKE_PROJECT_ID:${GKE_PROJECT_ID} \
-v EKS_WORKER_ROLE_ARN:${EKS_WORKER_ROLE_ARN} -v EKS_CLUSTER_ROLE_ARN:${EKS_CLUSTER_ROLE_ARN} \
-v EKS_SUBNET_IDS:${EKS_SUBNET_IDS} \
-v CLUSTER_NAME:${CLUSTER_NAME} -v PR_NUMBER:${PR_NUMBER} \
-f manifests/prombench/nodes_${PROVIDER}.yaml
-f ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}/nodes_${PROVIDER}.yaml

resource_apply:
$(INFRA_CMD) ${PROVIDER} resource apply -a ${AUTH_FILE} \
-v ZONE:${ZONE} -v GKE_PROJECT_ID:${GKE_PROJECT_ID} \
-v CLUSTER_NAME:${CLUSTER_NAME} \
-v PR_NUMBER:${PR_NUMBER} -v RELEASE:${RELEASE} -v DOMAIN_NAME:${DOMAIN_NAME} \
-v GITHUB_ORG:${GITHUB_ORG} -v GITHUB_REPO:${GITHUB_REPO} \
-f manifests/prombench/benchmark
-f ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}/benchmark

# Required because namespace and cluster-role are not part of the created nodes
resource_delete:
$(INFRA_CMD) ${PROVIDER} resource delete -a ${AUTH_FILE} \
-v ZONE:${ZONE} -v GKE_PROJECT_ID:${GKE_PROJECT_ID} \
-v CLUSTER_NAME:${CLUSTER_NAME} -v PR_NUMBER:${PR_NUMBER} \
-f manifests/prombench/benchmark/1c_cluster-role-binding.yaml \
-f manifests/prombench/benchmark/1a_namespace.yaml
-f ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}/benchmark/1c_cluster-role-binding.yaml \
-f ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}/benchmark/1a_namespace.yaml

node_delete:
$(INFRA_CMD) ${PROVIDER} nodes delete -a ${AUTH_FILE} \
-v ZONE:${ZONE} -v GKE_PROJECT_ID:${GKE_PROJECT_ID} \
-v EKS_WORKER_ROLE_ARN:${EKS_WORKER_ROLE_ARN} -v EKS_CLUSTER_ROLE_ARN:${EKS_CLUSTER_ROLE_ARN} \
-v EKS_SUBNET_IDS:${EKS_SUBNET_IDS} \
-v CLUSTER_NAME:${CLUSTER_NAME} -v PR_NUMBER:${PR_NUMBER} \
-f manifests/prombench/nodes_${PROVIDER}.yaml
-f ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}/nodes_${PROVIDER}.yaml

all_nodes_running:
$(INFRA_CMD) ${PROVIDER} nodes check-running -a ${AUTH_FILE} \
-v ZONE:${ZONE} -v GKE_PROJECT_ID:${GKE_PROJECT_ID} \
-v EKS_WORKER_ROLE_ARN:${EKS_WORKER_ROLE_ARN} -v EKS_CLUSTER_ROLE_ARN:${EKS_CLUSTER_ROLE_ARN} \
-v EKS_SUBNET_IDS:${EKS_SUBNET_IDS} -v SEPARATOR:${SEPARATOR} \
-v CLUSTER_NAME:${CLUSTER_NAME} -v PR_NUMBER:${PR_NUMBER} \
-f manifests/prombench/nodes_${PROVIDER}.yaml
-f ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}/nodes_${PROVIDER}.yaml

all_nodes_deleted:
$(INFRA_CMD) ${PROVIDER} nodes check-deleted -a ${AUTH_FILE} \
-v ZONE:${ZONE} -v GKE_PROJECT_ID:${GKE_PROJECT_ID} \
-v EKS_WORKER_ROLE_ARN:${EKS_WORKER_ROLE_ARN} -v EKS_CLUSTER_ROLE_ARN:${EKS_CLUSTER_ROLE_ARN} \
-v EKS_SUBNET_IDS:${EKS_SUBNET_IDS} -v SEPARATOR:${SEPARATOR} \
-v CLUSTER_NAME:${CLUSTER_NAME} -v PR_NUMBER:${PR_NUMBER} \
-f manifests/prombench/nodes_${PROVIDER}.yaml
-f ${PROMBENCH_DIR}/${BENCHMARK_DIRECTORY}/nodes_${PROVIDER}.yaml
29 changes: 20 additions & 9 deletions prombench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,24 +4,24 @@

This setup leverages **GitHub Actions** and **Google Kubernetes Engine (GKE)**, but is designed to be extendable to other Kubernetes providers.

## Overview of Manifest Files
## Configuration Files

The `/manifest` directory contains Kubernetes manifest files:
The `./manifest` directory contains configuration files. We can outline :

- **`cluster_gke.yaml`**: Creates the Main Node in GKE.
- **`cluster_eks.yaml`**: Creates the Main Node in EKS.
- **`cluster-infra/`**: Contains persistent components of the Main Node.
- **`prombench/`**: Resources created and destroyed for each Prombench test.
- **`./manifest/cluster_gke.yaml`**: Creates the Main Node in GKE.
- **`./manifest/cluster_eks.yaml`**: Creates the Main Node in EKS.
- **`./manifest/cluster-infra/`**: Contains persistent components of the Main Node.
- **`./manifest/prombench/`**: Resources created and destroyed for each Prombench test. See [`its README.md`](./manifests/prombench/README.md) for details.

## Setup and Running Prombench
## Prombench Setup

Prombench can be run on different providers. Follow these instructions based on your provider:

- [Google Kubernetes Engine (GKE)](docs/gke.md)
- [Kubernetes In Docker (KIND)](docs/kind.md)
- [Elastic Kubernetes Service (EKS)](docs/eks.md)

## Setting Up GitHub Actions
### Setting Up GitHub Actions

1. Place a workflow file in the `.github` directory of your repository. Refer to the [Prometheus GitHub repository](https://github.com/prometheus/prometheus) for an example.

Expand All @@ -30,27 +30,38 @@ Prombench can be run on different providers. Follow these instructions based on
```bash
cat $AUTH_FILE | base64 -w 0
```

3. Configure webhook to cluster's comment-monitor as described [here](../tools/comment-monitor/README.md#setting-up-the-github-webhook).

## Prombench Usage

### Triggering Tests via GitHub Comment

**Starting Tests:**

- `/prombench main` or `/prombench master` - Compare PR with the main/master branch.
- `/prombench v2.4.0` - Compare PR with a specific release version (e.g., from [quay.io/prometheus/prometheus:releaseVersion](https://quay.io/prometheus/prometheus:releaseVersion)).
- `/prombench v2.4.0 --bench.version=@aca1803ccf5d795eee4b0848707eab26d05965cc` - Compare with 2.4.0 release, but use a specific `aca1803ccf5d795eee4b0848707eab26d05965cc` commit on this repository for `./manifests/prombench` resources.
- `/prombench v2.4.0 --bench.version=mybranch` - Compare with 2.4.0 release, but use a specific `mybranch` on this repository for `./manifests/prombench` resources.
- `/prombench v2.4.0 --bench.directory=manifests/prombench-agent-mode` - Compare with 2.4.0 release, but use a specific resource directory on `master` branch for this repository. Currently there is only `./manifests/prombench` available (default), we might add more modes in the future.

**Restarting Tests:**

- `/prombench restart <release_version>`
- `/prombench restart <release_version> --bench.version=... --bench.directory...`

**Stopping Tests:**

- `/prombench cancel`

**Printing available commands:**

- `/prombench help`

### Building the Docker Image

Build the Docker image with:

```bash
docker build -t prominfra/prombench:master .
```

Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,14 @@ data:
* To restart benchmark: `/prombench restart <branch or git tag to compare with>`
* To stop benchmark: `/prombench cancel`
* To print help: `/prombench help`

**Example:** `/prombench v3.0.0`

**Advanced Flags for `start` and `restart` Commands:**:
* `--bench.directory` (default: `manifests/prombench`)
* `--bench.version` (default: `master`)

**Examples:**
* `/prombench v3.0.0`
* `/prombench v3.0.0 --bench.version=@aca1803ccf5d795eee4b0848707eab26d05965cc --bench.directory=manifests/prombench`

verify_user: true
commands:
Expand All @@ -24,12 +30,25 @@ data:

- name: restart
event_type: prombench_restart
args_regex: (?P<RELEASE>master|main|v[0-9]+\.[0-9]+\.[0-9]+\S*)$
arg_regex: (master|main|v[0-9]+\.[0-9]+\.[0-9]+\S*)
arg_name: RELEASE
flag_args:
bench.directory: BENCHMARK_DIRECTORY
bench.version: BENCHMARK_VERSION
comment_template: |
⏱️ Welcome (again) to Prometheus Benchmarking Tool. ⏱️

**Compared versions:** [**`PR-{{ index . "PR_NUMBER" }}`**](http://{{ index . "DOMAIN_NAME" }}/{{ index . "PR_NUMBER" }}/prometheus-pr) and [**`{{ index . "RELEASE" }}`**](http://{{ index . "DOMAIN_NAME" }}/{{ index . "PR_NUMBER" }}/prometheus-release)


{{- $version := index . "BENCHMARK_VERSION" }}
{{- $directory := index . "BENCHMARK_DIRECTORY" | print "manifests/prombench" }}
{{- with $version }}

**Custom benchmark version:**
{{- if hasPrefix $version "@" }} [**`{{ trimPrefix $version "@" }}` commit**](https://github.com/prometheus/test-infra/tree/{{ trimPrefix $version "@" }}/prombench/{{ $directory }})
{{- else }} [**`{{ $version }}` branch**](https://github.com/prometheus/test-infra/tree/{{ $version }}/prombench/{{ $directory }}){{ end }}
{{- end }}

After successful deployment ([check status here](https://github.com/prometheus/prometheus/actions/workflows/prombench.yml)), the benchmarking results can be viewed at:

- [Prometheus Meta](http://{{ index . "DOMAIN_NAME" }}/prometheus-meta/graph?g0.expr={namespace%3D"prombench-{{ index . "PR_NUMBER" }}"}&g0.tab=1)
Expand All @@ -38,19 +57,32 @@ data:
- [Parca profiles (e.g. in-use memory)](http://{{ index . "DOMAIN_NAME" }}/profiles?expression_a=memory%3Ainuse_space%3Abytes%3Aspace%3Abytes%7Bpr_number%3D%22{{ index . "PR_NUMBER" }}%22%7D&time_selection_a=relative:minute|15)

**Available Commands:**
* To restart benchmark: `/prombench restart {{ index . "RELEASE" }}`
* To restart benchmark: `/prombench restart {{ index . "RELEASE" }}{{ if index . "BENCHMARK_VERSION" }} --bench.version={{ index . "BENCHMARK_VERSION" }}{{ end }}{{ if index . "BENCHMARK_DIRECTORY" }} --bench.directory={{ index . "BENCHMARK_DIRECTORY" }}{{ end }}`
* To stop benchmark: `/prombench cancel`
* To print help: `/prombench help`

- name: "" # start is a default (empty command).
event_type: prombench_start
args_regex: (?P<RELEASE>master|main|v[0-9]+\.[0-9]+\.[0-9]+\S*)$
arg_regex: (master|main|v[0-9]+\.[0-9]+\.[0-9]+\S*)
arg_name: RELEASE
flag_args:
bench.directory: BENCHMARK_DIRECTORY
bench.version: BENCHMARK_VERSION
label: prombench
comment_template: |
⏱️ Welcome to Prometheus Benchmarking Tool. ⏱️

**Compared versions:** [**`PR-{{ index . "PR_NUMBER" }}`**](http://{{ index . "DOMAIN_NAME" }}/{{ index . "PR_NUMBER" }}/prometheus-pr) and [**`{{ index . "RELEASE" }}`**](http://{{ index . "DOMAIN_NAME" }}/{{ index . "PR_NUMBER" }}/prometheus-release)

{{- $version := index . "BENCHMARK_VERSION" }}
{{- $directory := index . "BENCHMARK_DIRECTORY" | print "manifests/prombench" }}
{{- with $version }}

**Custom benchmark version:**
{{- if hasPrefix $version "@" }} [**`{{ trimPrefix $version "@" }}` commit**](https://github.com/prometheus/test-infra/tree/{{ trimPrefix $version "@" }}/prombench/{{ $directory }})
{{- else }} [**`{{ $version }}` branch**](https://github.com/prometheus/test-infra/tree/{{ $version }}/prombench/{{ $directory }}){{ end }}
{{- end }}

After the successful deployment ([check status here](https://github.com/prometheus/prometheus/actions/workflows/prombench.yml)), the benchmarking results can be viewed at:

- [Prometheus Meta](http://{{ index . "DOMAIN_NAME" }}/prometheus-meta/graph?g0.expr={namespace%3D"prombench-{{ index . "PR_NUMBER" }}"}&g0.tab=1)
Expand All @@ -59,7 +91,7 @@ data:
- [Parca profiles (e.g. in-use memory)](http://{{ index . "DOMAIN_NAME" }}/profiles?expression_a=memory%3Ainuse_space%3Abytes%3Aspace%3Abytes%7Bpr_number%3D%22{{ index . "PR_NUMBER" }}%22%7D&time_selection_a=relative:minute|15)

**Available Commands:**
* To restart benchmark: `/prombench restart {{ index . "RELEASE" }}`
* To restart benchmark: `/prombench restart {{ index . "RELEASE" }}{{ if index . "BENCHMARK_VERSION" }} --bench.version={{ index . "BENCHMARK_VERSION" }}{{ end }}{{ if index . "BENCHMARK_DIRECTORY" }} --bench.directory={{ index . "BENCHMARK_DIRECTORY" }}{{ end }}`
* To stop benchmark: `/prombench cancel`
* To print help: `/prombench help`

53 changes: 53 additions & 0 deletions prombench/manifests/prombench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
## Prombench Benchmark Scenario Configuration

This directory contains resources that are applied (and cleaned) on every benchmark request
via `infra` CLI using [`make deploy`](../../Makefile) and cleaned using [`make clean`](../../Makefile).

It assumes running cluster was created via `infra` CLI using `make cluster_create` and `make cluster_delete`.

### Customizations

#### Benchmarking from the custom test-infra commit/branch

> NOTE: See https://github.com/prometheus/proposals/pull/41 for design.

On the `master` branch, in this directory, we maintain the standard, single benchmarking scenario used
as an acceptance validation for Prometheus. It's important to ensure it represents common Prometheus configuration.

The only user related parameter for the standard scenario is `RELEASE` version.

However, it's possible to create, a fully custom benchmarking scenarios for `/prombench` via `--bench.version=<branch|@commit>` flag.

Here are an example steps:

1. Create a new branch on https://github.com/prometheus/test-infra e.g. `benchmark/scenario1`.
2. Modify this directory to your liking e.g. changing query load, metric load of advanced Prometheus configuration. It's also possible to make Prometheus deployments and versions exactly the same, but vary in a single configuration flag, for feature benchmarking.

> WARN: When customizing this directory, don't change `1a_namespace.yaml` or `1c_cluster-role-binding.yaml` filenames as they are used for cleanup routine. Or, if you change it, know what you're doing in relation to [`make clean` job](../../Makefile).

3. Push changes to the new branch.
4. From the Prometheus PR comment, call prombench as `/prombench <release> --bench.version=benchmark/scenario1` or `/prombench <release> --bench.version=@<relevant commit SHA from the benchmark/scenario1>` to use configuration files from this custom branch.

Other details:

* Other custom branch modifications other than to this directory do not affect prombench (e.g. to infra CLI or makefiles).
* `--bench.version` is designed for a short-term or even one-off benchmark scenario configurations. It's not designed for long-term, well maintained scenarios. For the latter reason we can later e.g. maintain multiple `manifests/prombench` directories and use it via [`--bench.directory` flag](#benchmarking-from-the-different-directory).
* Non-maintainers can follow similar process, but they will need to ask maintainer for a new branch and PR review. We can consider extending `--bench.version` to support remote repositories if this becomes a problem.
* Custom benchmarking logic is implemented in the [`maybe_pull_custom_version` make job](../../Makefile) and invoked by the prombench GH job on Prometheus repo on `deploy` and `clean`.

#### Benchmarking from the different directory.

On top of the commit/branch you can also specify custom directory with `--bench.directory` (default to this directory, so `manifests/prombench` value). This is designed if we even want to maintain standard benchmark modes for longer time e.g. agent mode.

For one-off benchmarks prefer one-off branches.

### Variables

It expects the following templated variables:

* `.PR_NUMBER`: The PR number from which `/prombench` was triggered. This PR number also tells what commit to use for the `prometheus-test-pr-{{ .PR_NUMBER }}` Prometheus image building (in the init container).
* `.RELEASE`: The argument provided by `/prombench` caller representing the Prometheus version (docker image tag for `quay.io/prometheus/prometheus:{{ .RELEASE }}`) to compare with, deployed as the `prometheus-test-{{ .RELEASE }}`.
* `.DOMAIN_NAME`
* `.LOADGEN_SCALE_UP_REPLICAS`
* `.GITHUB_ORG`
* `.GITHUB_REPO`
Loading
Loading