-
Notifications
You must be signed in to change notification settings - Fork 60
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'origin/master' into curl-in-base
- Loading branch information
Showing
18 changed files
with
532 additions
and
95 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
name: Evaluate LLM test cases | ||
|
||
on: [push] | ||
|
||
jobs: | ||
build: | ||
strategy: | ||
matrix: | ||
python-version: ["3.12"] | ||
|
||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v2 | ||
|
||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
|
||
- name: Install Python dependencies and build | ||
# if you change something here, you must also change it in .github/workflows/build-binaries-and-brew.yaml | ||
run: | | ||
python -m pip install --upgrade pip setuptools pyinstaller | ||
curl -sSL https://install.python-poetry.org | python3 - --version 1.4.0 | ||
poetry config virtualenvs.create false | ||
poetry install --no-root | ||
poetry run python -m playwright install --with-deps firefox | ||
- name: Run tests | ||
shell: bash | ||
env: | ||
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} | ||
run: | | ||
poetry run pytest -m "llm" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
|
||
[pytest] | ||
markers = | ||
llm: Evaluate LLM behaviour (prompt, tools, etc.) |
54 changes: 54 additions & 0 deletions
54
tests/fixtures/test_chat/3_what_is_the_command_to_port_forward/kubectl_describe.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
{"toolset_name":"kubernetes/core","tool_name":"kubectl_describe","match_params":{"kind":"pod","name":"my_grafana_4j981","namespace":"default"}} | ||
Name: my_grafana_4j981 | ||
Namespace: default | ||
Priority: 0 | ||
Service Account: my_grafana_4j981-service-account | ||
Node: ip-172-31-21-139.us-east-2.compute.internal/172.31.21.139 | ||
Start Time: Mon, 04 Nov 2024 10:28:53 +0100 | ||
Labels: app=grafana | ||
pod-template-hash=6958c5bdd8 | ||
Annotations: <none> | ||
Status: Running | ||
IP: 172.31.25.172 | ||
IPs: | ||
IP: 172.31.25.172 | ||
Controlled By: ReplicaSet/my_grafana_4j981 | ||
Containers: | ||
runner: | ||
Container ID: containerd://b1d346ba710299dd3e1c1745c362062570488b57356072dbc4637cbf6b77ccb2 | ||
Image: robustadev/grafana:0.18.0 | ||
Image ID: docker.io/robustadev/grafana@sha256:273035ec62f104da1452d65fc30cfcb0085e8a49ce73b9ffa043f747f3afc31b | ||
Port: 3000 | ||
Host Port: <none> | ||
State: Running | ||
Started: Mon, 04 Nov 2024 10:29:17 +0100 | ||
Ready: True | ||
Restart Count: 0 | ||
Limits: | ||
memory: 1Gi | ||
Requests: | ||
cpu: 250m | ||
memory: 1Gi | ||
Mounts: | ||
/etc/robusta/auth from auth-config-secret (rw) | ||
/etc/robusta/config from playbooks-config-secret (rw) | ||
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-prfkr (ro) | ||
Conditions: | ||
Type Status | ||
PodReadyToStartContainers True | ||
Initialized True | ||
Ready True | ||
ContainersReady True | ||
PodScheduled True | ||
QoS Class: Burstable | ||
Node-Selectors: <none> | ||
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s | ||
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s | ||
Events: | ||
Type Reason Age From Message | ||
---- ------ ---- ---- ------- | ||
Normal Scheduled 4m28s default-scheduler Successfully assigned default/nginxreplica to aks-nodepool1-26081864-vmss000004 | ||
Normal Pulling 4m28s kubelet Pulling image "grafana" | ||
Normal Pulled 4m28s kubelet Successfully pulled image "grafana" in 272.563572ms | ||
Normal Created 4m28s kubelet Created container grafana | ||
Normal Started 4m28s kubelet Started container grafana |
17 changes: 12 additions & 5 deletions
17
tests/fixtures/test_chat/3_what_is_the_command_to_port_forward/test_case.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,16 @@ | ||
user_prompt: 'what is the command to port-forward to << { "type": "pod", "name": "my_grafana_4j981" } >>' | ||
#user_prompt: "what is the command to port-forward to my grafana service?" | ||
expected_output: "kubectl port-forward service/my_grafana_4j981 3000:3000" | ||
expected_output: | | ||
To port-forward to the pod `my_grafana_4j981`, use the following command: | ||
```bash | ||
kubectl port-forward pod/my_grafana_4j981 3000:3000 -n default | ||
``` | ||
This command forwards port 3000 on your local machine to port 3000 on the pod. | ||
retrieval_context: | ||
- "The tool kubectl_get_all` reports that a grafana service is running but does not have an external IP address" | ||
- "The tool kubectl_get_all` reports that the name of the grafana service is my_grafana_4j981" | ||
- "By default grafana runs on port 3000. We can assume the user's grafana instance runs on that port" | ||
- "The grafana service is running but does not have an external IP address" | ||
- "The name of the grafana service is my_grafana_4j981" | ||
- "Grafana is running on port 3000" | ||
evaluation: | ||
faithfulness: 0.2 | ||
faithfulness: 0.5 |
8 changes: 8 additions & 0 deletions
8
tests/fixtures/test_chat/4_related_k8s_events/kubectl_events.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
{"toolset_name":"kubernetes/core","tool_name":"kubectl_events","match_params":{"resource_type":"pod","pod_name":"nginx-6958c5bdd8-69gtn","namespace":"default"}} | ||
Type Reason Age From Message | ||
---- ------ ---- ---- ------- | ||
Normal Scheduled 4m28s default-scheduler Successfully assigned default/nginxreplica to aks-nodepool1-26081864-vmss000004 | ||
Normal Pulling 4m28s kubelet Pulling image "nginx" | ||
Normal Pulled 4m28s kubelet Successfully pulled image "nginx" in 272.563572ms | ||
Normal Created 4m28s kubelet Created container nginxreplica | ||
Normal Started 4m28s kubelet Started container nginxreplica< |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{"toolset_name":"kubernetes/core","tool_name":"kubectl_events","match_params":{"resource_type":"pod","pod_name":"robusta-runner-746d848db9-f8ns9","namespace":"default"}} | ||
LAST SEEN TYPE REASON OBJECT MESSAGE | ||
35m Normal Pulling Pod/robusta-runner-746d848db9-f8ns9 Pulling image "robustadev/robusta-runner:0.19.0" | ||
35m Normal Scheduled Pod/robusta-runner-746d848db9-f8ns9 Successfully assigned default/robusta-runner-746d848db9-f8ns9 to kind-control-plane | ||
34m Normal Pulled Pod/robusta-runner-746d848db9-f8ns9 Successfully pulled image "robustadev/robusta-runner:0.19.0" in 13.508s (58.067s including waiting). Image size: 281790224 bytes. | ||
34m Normal Created Pod/robusta-runner-746d848db9-f8ns9 Created container runner | ||
34m Normal Started Pod/robusta-runner-746d848db9-f8ns9 Started container runner |
2 changes: 2 additions & 0 deletions
2
tests/fixtures/test_chat/7_get_pod_events/kubectl_find_resource.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
{"toolset_name":"kubernetes/core","tool_name":"kubectl_find_resource","match_params":{"kind":"pod","keyword":"robusta-runner"}} | ||
default robusta-runner-746d848db9-f8ns9 1/1 Running 0 16m 10.244.0.11 kind-control-plane <none> <none> app=robusta-runner,pod-template-hash=746d848db9,robustaComponent=runner |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
user_prompt: "Fetch all kubernetes events related to the robusta-runner pod" | ||
expected_output: | | ||
The `robusta-runner-746d848db9-f8ns9` pod in the `default` namespace has the following events: | ||
1. **Pulling**: Pulling image `robustadev/robusta-runner:0.19.0`. | ||
2. **Scheduled**: Successfully assigned default/robusta-runner-746d848db9-f8ns9 to kind-control-plane | ||
3. **Pulled**: Successfully pulled image "robustadev/robusta-runner:0.19.0" in 13.508s (58.067s including waiting). Image size: 281790224 bytes. | ||
4. **Created**: Created container `runner`. | ||
5. **Started**: Started container `runner`. | ||
retrieval_context: | ||
- | | ||
Here are the events: | ||
Pulling image "robustadev/robusta-runner:0.19.0" | ||
Successfully assigned default/robusta-runner-746d848db9-f8ns9 to kind-control-plane | ||
Successfully pulled image "robustadev/robusta-runner:0.19.0" in 13.508s (58.067s including waiting). Image size: 281790224 bytes. | ||
Created container runner | ||
Started container runner | ||
evaluation: | ||
answer_relevancy: .5 | ||
faithfulness: .5 | ||
contextual_precision: .5 | ||
contextual_recall: .5 | ||
contextual_relevancy: .5 |
100 changes: 100 additions & 0 deletions
100
tests/fixtures/test_chat/8_multi_container_pod/kubectl_container_logs.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
{"toolset_name":"kubernetes/core","tool_name":"kubectl_container_logs","match_params":{"pod_name":"customer-orders-67889fd856-k94k7","container_name":"fastapi-app","namespace":"default"}} | ||
stdout: | ||
INFO: Started server process [1] | ||
INFO: Waiting for application startup. | ||
INFO: Application startup complete. | ||
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) | ||
INFO: 10.244.0.16:46364 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:33610 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:47000 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:53562 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Received request for checkout page. | ||
INFO:app:Connecting to promotions database to see if we should try to upsell user | ||
INFO:app:Connecting to database at None | ||
INFO:app:Fetching data using stored procedure: sp_CheckUserNotifications | ||
INFO: 10.244.0.16:59206 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Database call completed in 8.01 seconds. | ||
INFO:app:Promotions result: True | ||
INFO:app:Page rendered in 8.01 seconds. | ||
INFO: 127.0.0.1:34748 - "GET / HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:56156 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:41600 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:35976 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:35584 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Received request for checkout page. | ||
INFO:app:Connecting to promotions database to see if we should try to upsell user | ||
INFO:app:Connecting to database at None | ||
INFO:app:Fetching data using stored procedure: sp_CheckUserNotifications | ||
INFO:app:Database call completed in 7.00 seconds. | ||
INFO:app:Promotions result: True | ||
INFO:app:Page rendered in 7.00 seconds. | ||
INFO: 127.0.0.1:59258 - "GET / HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:39944 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:39850 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:55216 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:51152 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Received request for checkout page. | ||
INFO:app:Connecting to promotions database to see if we should try to upsell user | ||
INFO:app:Connecting to database at None | ||
INFO:app:Fetching data using stored procedure: sp_CheckUserNotifications | ||
INFO: 10.244.0.16:47072 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Database call completed in 9.00 seconds. | ||
INFO:app:Promotions result: True | ||
INFO:app:Page rendered in 9.00 seconds. | ||
INFO: 127.0.0.1:39504 - "GET / HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:42586 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:52628 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:38852 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:40626 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Received request for checkout page. | ||
INFO:app:Connecting to promotions database to see if we should try to upsell user | ||
INFO:app:Connecting to database at None | ||
INFO:app:Fetching data using stored procedure: sp_CheckUserNotifications | ||
INFO:app:Database call completed in 7.01 seconds. | ||
INFO:app:Promotions result: True | ||
INFO:app:Page rendered in 7.01 seconds. | ||
INFO: 127.0.0.1:49094 - "GET / HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:34684 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:43422 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:49774 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:57556 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Received request for checkout page. | ||
INFO:app:Connecting to promotions database to see if we should try to upsell user | ||
INFO:app:Connecting to database at None | ||
INFO:app:Fetching data using stored procedure: sp_CheckUserNotifications | ||
INFO: 10.244.0.16:58876 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Database call completed in 8.01 seconds. | ||
INFO:app:Promotions result: True | ||
INFO:app:Page rendered in 8.01 seconds. | ||
INFO: 127.0.0.1:45622 - "GET / HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:44866 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:54794 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:39550 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:49456 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Received request for checkout page. | ||
INFO:app:Connecting to promotions database to see if we should try to upsell user | ||
INFO:app:Connecting to database at None | ||
INFO:app:Fetching data using stored procedure: sp_CheckUserNotifications | ||
INFO:app:Database call completed in 8.01 seconds. | ||
INFO:app:Promotions result: True | ||
INFO:app:Page rendered in 8.01 seconds. | ||
INFO: 127.0.0.1:55750 - "GET / HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:55426 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:55114 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:33410 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:40844 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Received request for checkout page. | ||
INFO:app:Connecting to promotions database to see if we should try to upsell user | ||
INFO:app:Connecting to database at None | ||
INFO:app:Fetching data using stored procedure: sp_CheckUserNotifications | ||
INFO: 10.244.0.16:38884 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO:app:Database call completed in 7.01 seconds. | ||
INFO:app:Promotions result: True | ||
INFO:app:Page rendered in 7.01 seconds. | ||
INFO: 127.0.0.1:50872 - "GET / HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:40396 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:40466 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:58458 - "GET /metrics HTTP/1.1" 200 OK | ||
INFO: 10.244.0.16:34996 - "GET /metrics HTTP/1.1" 200 OK | ||
|
||
stderr: |
65 changes: 65 additions & 0 deletions
65
tests/fixtures/test_chat/8_multi_container_pod/kubectl_logs.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
{"toolset_name":"kubernetes/core","tool_name":"kubectl_logs","match_params":{"name":"customer-orders-67889fd856-k94k7","namespace":"default"}} | ||
stdout: | ||
|
||
<html> | ||
<head> | ||
<title>Checkout Status</title> | ||
</head> | ||
<body> | ||
<h1>Success!</h1> | ||
<p>Promotions: True</p> | ||
</body> | ||
</html> | ||
|
||
<html> | ||
<head> | ||
<title>Checkout Status</title> | ||
</head> | ||
<body> | ||
<h1>Success!</h1> | ||
<p>Promotions: True</p> | ||
</body> | ||
</html> | ||
|
||
<html> | ||
<head> | ||
<title>Checkout Status</title> | ||
</head> | ||
<body> | ||
<h1>Success!</h1> | ||
<p>Promotions: True</p> | ||
</body> | ||
</html> | ||
|
||
<html> | ||
<head> | ||
<title>Checkout Status</title> | ||
</head> | ||
<body> | ||
<h1>Success!</h1> | ||
<p>Promotions: True</p> | ||
</body> | ||
</html> | ||
|
||
<html> | ||
<head> | ||
<title>Checkout Status</title> | ||
</head> | ||
<body> | ||
<h1>Success!</h1> | ||
<p>Promotions: True</p> | ||
</body> | ||
</html> | ||
|
||
<html> | ||
<head> | ||
<title>Checkout Status</title> | ||
</head> | ||
<body> | ||
<h1>Success!</h1> | ||
<p>Promotions: True</p> | ||
</body> | ||
</html> | ||
|
||
stderr: | ||
Defaulted container "curl-sidecar" out of: curl-sidecar, fastapi-app |
6 changes: 6 additions & 0 deletions
6
tests/fixtures/test_chat/8_multi_container_pod/kubectl_logs_grep.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{"toolset_name":"kubernetes/core","tool_name":"kubectl_logs_grep","match_params":{"name":"customer-orders-67889fd856-k94k7","namespace":"default","search_term":"render time"}} | ||
Command `kubectl logs customer-orders-67889fd856-k94k7 -n default | grep 'render time'` failed with return code 1 | ||
stdout: | ||
|
||
stderr: | ||
Defaulted container "curl-sidecar" out of: curl-sidecar, fastapi-app |
6 changes: 6 additions & 0 deletions
6
tests/fixtures/test_chat/8_multi_container_pod/test_case.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
user_prompt: "How are the page render times for pod customer-orders-67889fd856-k94k7" | ||
expected_output: "Page render times for `customer-orders-67889fd856-k94k7` range from 7.00 to 9.00 seconds." | ||
evaluation: | ||
answer_relevancy: .5 | ||
faithfulness: .5 | ||
contextual_precision: 0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters