Skip to content

Open Design Questions for ScaleOut Planner #129

@elankath

Description

@elankath

Open Design Issue for Multi-Simulation-Per-Group

Multi-Simulation-Per-Group Questions

Edge cases that we need to tackle in the planner component using the Multi-Simulation-Per-Group strategy, since our current node scoring formula does not make globally optimal decisions by itself:

scoring formula:
score= ((# units of cpu scheduled x cpu_weight)+(# units of memory scheduled x mem_weight))/price 

winning score => whichever node pool has the highest score

Scenario1:

np1=>m.large (4CPU, 16GB, $72)
np2=>m.xlarge (8CPU, 32GB, $120)

3 pods: 2CPU 8GB each

Assuming cpu_weight=5, and mem_weight=1

1st run
np1: 2 pods (4*5+16*1)=> 36/72=0.5 <-- winner
np2: 3 pods (6*5+24*1)=> 54/120=0.45 

Next run:
1 pod remaining: 2 CPU, 8GB

5:1 CPU:mem
2nd run
n1: 1 pod (2*5+8*1)=> 18/72=0.25 <-- winner
n2: 1 pods (2*5+8*1)=> 18/120=0.15 

Scaling decision: scale up 2 nodes in np1, costing $144
If instead, one node had been brought up in np2 in the first run, total cost would have been only $120

Scenario 2

This scenario was encountered during the POC evaluations. Price values have been changed.

The node scoring strategy resulted in a globally sub-optimal decision, requiring it to scale up one additional node compared to CA. This resulted in higher costs.
In the first run, it scaled up a cheaper instance (np1 [$40] instead of an np2 [$60]). However, in the second run because of an untolerated taint on this node, the remaining pod triggers a scale up of an np2 [$60] node. Had a single np2 node been brought up in the first run, both pods would have got scheduled on this node.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions