unify e2e package options#47181
Conversation
7bea089 to
602ad25
Compare
Gitlab CI Configuration Changes
|
| Removed | Modified | Added | Renamed |
|---|---|---|---|
| 0 | 10 | 0 | 0 |
ℹ️ Diff available in the job log.
Files inventory check summaryFile checks results against ancestor cbddca27: Results for datadog-agent_7.79.0~devel.git.118.d123c26.pipeline.104179992-1_amd64.deb:No change detected |
Static quality checks✅ Please find below the results from static quality gates 31 successful checks with minimal change (< 2 KiB)
On-wire sizes (compressed)
|
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: c60cac7 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_cpu | % cpu utilization | -0.58 | [-3.57, +2.41] | 1 | Logs |
Fine details of change detection per experiment
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | docker_containers_memory | memory utilization | +1.64 | [+1.51, +1.78] | 1 | Logs |
| ➖ | quality_gate_logs | % cpu utilization | +1.26 | [-0.37, +2.89] | 1 | Logs bounds checks dashboard |
| ➖ | tcp_syslog_to_blackhole | ingress throughput | +1.05 | [+0.89, +1.21] | 1 | Logs |
| ➖ | file_tree | memory utilization | +0.47 | [+0.41, +0.53] | 1 | Logs |
| ➖ | ddot_metrics_sum_cumulative | memory utilization | +0.17 | [+0.03, +0.31] | 1 | Logs |
| ➖ | uds_dogstatsd_20mb_12k_contexts_20_senders | memory utilization | +0.09 | [+0.03, +0.15] | 1 | Logs |
| ➖ | file_to_blackhole_500ms_latency | egress throughput | +0.07 | [-0.33, +0.46] | 1 | Logs |
| ➖ | otlp_ingest_metrics | memory utilization | +0.06 | [-0.11, +0.23] | 1 | Logs |
| ➖ | quality_gate_idle_all_features | memory utilization | +0.06 | [+0.02, +0.10] | 1 | Logs bounds checks dashboard |
| ➖ | file_to_blackhole_0ms_latency | egress throughput | +0.05 | [-0.46, +0.56] | 1 | Logs |
| ➖ | file_to_blackhole_100ms_latency | egress throughput | +0.02 | [-0.08, +0.11] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api | ingress throughput | +0.00 | [-0.21, +0.21] | 1 | Logs |
| ➖ | tcp_dd_logs_filter_exclude | ingress throughput | -0.00 | [-0.11, +0.11] | 1 | Logs |
| ➖ | uds_dogstatsd_to_api_v3 | ingress throughput | -0.00 | [-0.20, +0.20] | 1 | Logs |
| ➖ | file_to_blackhole_1000ms_latency | egress throughput | -0.02 | [-0.47, +0.42] | 1 | Logs |
| ➖ | ddot_logs | memory utilization | -0.08 | [-0.14, -0.01] | 1 | Logs |
| ➖ | quality_gate_idle | memory utilization | -0.13 | [-0.18, -0.08] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics_sum_cumulativetodelta_exporter | memory utilization | -0.15 | [-0.37, +0.08] | 1 | Logs |
| ➖ | ddot_metrics_sum_delta | memory utilization | -0.15 | [-0.32, +0.01] | 1 | Logs |
| ➖ | quality_gate_metrics_logs | memory utilization | -0.25 | [-0.48, -0.02] | 1 | Logs bounds checks dashboard |
| ➖ | ddot_metrics | memory utilization | -0.44 | [-0.62, -0.27] | 1 | Logs |
| ➖ | otlp_ingest_logs | memory utilization | -0.45 | [-0.54, -0.36] | 1 | Logs |
| ➖ | docker_containers_cpu | % cpu utilization | -0.58 | [-3.57, +2.41] | 1 | Logs |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | docker_containers_cpu | simple_check_run | 10/10 | 702 ≥ 26 | |
| ✅ | docker_containers_memory | memory_usage | 10/10 | 274.63MiB ≤ 370MiB | |
| ✅ | docker_containers_memory | simple_check_run | 10/10 | 717 ≥ 26 | |
| ✅ | file_to_blackhole_0ms_latency | memory_usage | 10/10 | 0.19GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_0ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_1000ms_latency | memory_usage | 10/10 | 0.23GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_1000ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_100ms_latency | memory_usage | 10/10 | 0.20GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_100ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | file_to_blackhole_500ms_latency | memory_usage | 10/10 | 0.21GiB ≤ 1.20GiB | |
| ✅ | file_to_blackhole_500ms_latency | missed_bytes | 10/10 | 0B = 0B | |
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 = 3 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 174.75MiB ≤ 175MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 = 3 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 493.91MiB ≤ 550MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 204.51MiB ≤ 220MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 374.16 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 400.10MiB ≤ 475MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
|
@codex review |
💡 Codex Reviewdatadog-agent/test/new-e2e/tests/windows/common/agent/package.go Lines 249 to 250 in d4a5724
ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
jeremy-hanna
left a comment
There was a problem hiding this comment.
👍 for agent-runtime owned files
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: df95bd037c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
df95bd0 to
e07b715
Compare
What does this PR do?
Unifies the Windows E2E test package configuration under a consistent
{PREFIX}_*environment variable convention (CURRENT_AGENT_*,STABLE_AGENT_*), replacing the legacyWINDOWS_AGENT_*andLAST_STABLE_*variables.Key changes:
Resolve(): BothPackage(MSI) andTestPackageConfig(OCI) now defer URL/registry resolution to aResolve()method called at the end of their constructors. This ensures all options and overrides are collected before any I/O, preventing premature failures and unnecessary network calls.Lookup*FromEnvfunctions:LookupChannelFromEnv,LookupVersionFromEnv,LookupArchFromEnv,LookupFlavorFromEnv,LookupChannelURLFromEnv, andPackageFlavorEnvVarare removed.GetPackageFromEnvandGetLastStablePackageFromEnvnow accept optionalPackageOptiondefaults.WithURLFromPipeline,WithURLFromInstallersJSON, andWithMSIDevEnvOverridesare removed in favor of the lazyResolve()pattern.WithFlavor("fips")as a default option instead of manipulating env vars._ASSERT_VERSIONand_ASSERT_PACKAGE_VERSIONare used only for test assertions (e.g., comparing expected Agent version output), not for package resolution.Motivation
https://datadoghq.atlassian.net/browse/WINA-2382
Foundational work for running pipelines that can test upgrades from unreleased builds.
Unifying under a single convention simplifies the code, reduces duplication, and makes it straightforward to configure package sources consistently across all test types.
Separated assertion vs resolution variables
Necessary for proper overrides. The tests assertions depend on these vars being set, so using them also for overrides would choose the wrong package in some scenarios. For example, trying to use the current pipeline build but the pipeline would be overridden.
Describe how you validated your changes
only changes e2e tests, existing E2E tests should pass
Added unit tests for
setup_envhelper functions (tasks/unit_tests/setup_env_tests.py)setup_envhelper updated with new varsAdditional Notes
BaseAgentInstallerSuite.SetupSuite()now skipsGetPackageFromEnv()ifAgentPackageis already set, allowing suites to pre-configure the package before calling the base setup (used by FIPS tests).TestAgentUpgradesFromGA) useWithDevEnvOverrides(CI-guarded) so their hardcoded versions are not overridden in CI, but can still be overridden locally for development.