Aggregate CPU and Memory accross all nodes instead of per node to fix AWS jobs#3844
Aggregate CPU and Memory accross all nodes instead of per node to fix AWS jobs#3844ronaldngounou wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
/test pull-perf-tests-ec2-master-scale-performance-5000 |
|
/test pull-perf-tests-ec2-master-scale-performance-100 |
|
Running only ec2-master-scale-performance-100 is sufficient. We only run 5000 nodes test to debug performance issues. |
|
/assign @mengqiy |
|
Example on what @ronaldngounou is mentioning the result is that we can not see in aws a trend, since each execution will create different nodes. @ronaldngounou can we make this change so it is backward compatible and instead of replacing the existing resources, we add a new "fake" resource that we call "aggregated" or "summary" or something like that? |
|
Sounds good @aojea. I will address that note and update my PR. |
db52e1a to
bcd2372
Compare
bc81329 to
a0c7cac
Compare
a0c7cac to
4d72c6d
Compare
|
/test pull-perf-tests-ec2-master-scale-performance-100 |
|
/lgtm Thanks |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: aojea, ronaldngounou The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test pull-perf-tests-ec2-master-scale-performance-100 |
…WS jobs Signed-off-by: Ronald Ngounou <ronald.ngounou@yahoo.com>
4d72c6d to
2e0371a
Compare
|
New changes are detected. LGTM label has been removed. |
Overview
Contributes to issue #3843
Contributes to naming inconsistency issue with the AWS jobs, preventing LoadResources to be meaningful as it is not taking the average across all the nodes
2 possibilities were considered to fix:
Option 1: Arrange for the AWS nodes to have well-defined names
Option 2. Change the metrics gathering/display to just average over all the nodes without regard to their names
GCE tests don't have the same issue because they have a more consistent naming for the nodes whereas AWS logs the instance-id in the nodename. It makes it insconsisent to see the resource usage. Therefore, this PR averages CPU/Memory across all the nodes.
In this PR, option 2 was selected.
Updated
parseResourceUsageDatato take the average accross the nodesTesting
Deployed perfdash on
localhost:8080after following https://github.com/kubernetes/perf-tests/blob/master/perfdash/README.md[done] Confirming that CL2 generates the desired result files:
The artifacts are not containing the new
resourceUsageAggregatestruct added because this change is not modifying thetype resourceUsagePercentiles map[string][]resourceUsagesrather, it addstype usageAggregateAtPercentiles map[string]resourceUsageAggregatewhich is the new struct computed. I am not modifying the struct below, which contains the resources.perf-tests/perfdash/parser.go
Line 135 in 8d22586
However, I after running the perfdash locally, I'm able to see the aggregation across all the nodes.
Result after
URL =

http://localhost:8080/#/?jobname=gce-5000Nodes&metriccategoryname=E2E&metricname=LoadResources&Resource=CPU_Average&PodName=all-nodesafter the PRBefore
The
all-nodesis more useful for AWS jobs as the namings differ.