You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. The only point of instance launch is `launch_ec2_instances`. Therefore, at the end of the function, this commit adds instance ids to a file to keep track
2. `get_cluster_instances` used to rely solely on retrieving instances with tags `parallelcluster:cluster-name` and "tag:parallelcluster:node-type". It ceases to function when users manually remove tags from instances. This commit, in addition to getting instances by tags, gets instances from the file written by `launch_ec2_instances`. At the end of the function, this commits remove non-existing instances from the file.
3. remove argument `alive_states_only` for code simplicity.
With this commit, cluster scaling should work with/without tags. Therefore, logic of tags and logic of the file are redundant, therefore increase resilience.
This commit is work-in-progress for the following reasons:
1. Code in `get_cluster_instances` could be simplified
2. Requires changes in CLI and Cookbook
a. in CLI, the IAM policies for head node and clean up lambda are
```
{
"Action": "ec2:TerminateInstances",
"Resource": "*",
"Effect": "Allow",
"Sid": "EC2Terminate",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/parallelcluster:cluster-name": <cluster name>
}
},
},
```
it should be changed to
```
{
"Action": "ec2:TerminateInstances",
"Resource": "*",
"Effect": "Allow",
"Sid": "EC2Terminate",
"Condition": {
"StringEquals": {
"ec2:ResourceTag/aws:ec2launchtemplate:id": [
<Launch template id 1>,
<Launch template id 2>,
... It should contain all launch templates of compute and login nodes.
]
}
}
},
```
b. Cookbook should create the file `/etc/parallelcluster/slurm_plugin/running_nodes` during config stage and set it to the owner to `pcluster-admin:pcluster-admin`. This is necessary because the node package does not have permission to create the file
0 commit comments