-
Notifications
You must be signed in to change notification settings - Fork 9
Open Design Questions for ScaleOut Planner #129
Description
Open Design Issue for Multi-Simulation-Per-Group
Multi-Simulation-Per-Group Questions
Edge cases that we need to tackle in the planner component using the Multi-Simulation-Per-Group strategy, since our current node scoring formula does not make globally optimal decisions by itself:
scoring formula:
score= ((# units of cpu scheduled x cpu_weight)+(# units of memory scheduled x mem_weight))/price
winning score => whichever node pool has the highest score
Scenario1:
np1=>m.large (4CPU, 16GB, $72)
np2=>m.xlarge (8CPU, 32GB, $120)
3 pods: 2CPU 8GB each
Assuming cpu_weight=5, and mem_weight=1
1st run
np1: 2 pods (4*5+16*1)=> 36/72=0.5 <-- winner
np2: 3 pods (6*5+24*1)=> 54/120=0.45
Next run:
1 pod remaining: 2 CPU, 8GB
5:1 CPU:mem
2nd run
n1: 1 pod (2*5+8*1)=> 18/72=0.25 <-- winner
n2: 1 pods (2*5+8*1)=> 18/120=0.15
Scaling decision: scale up 2 nodes in np1, costing $144
If instead, one node had been brought up in np2 in the first run, total cost would have been only $120
Scenario 2
This scenario was encountered during the POC evaluations. Price values have been changed.
The node scoring strategy resulted in a globally sub-optimal decision, requiring it to scale up one additional node compared to CA. This resulted in higher costs.
In the first run, it scaled up a cheaper instance (np1 [$40] instead of an np2 [$60]). However, in the second run because of an untolerated taint on this node, the remaining pod triggers a scale up of an np2 [$60] node. Had a single np2 node been brought up in the first run, both pods would have got scheduled on this node.