Releases: aws/aws-parallelcluster-cookbook
AWS ParallelCluster v3.5.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.5.0
This is associated with AWS ParallelCluster v3.5.0
ENHANCEMENTS
- Fail cluster creation if cluster status changes to PROTECTED while provisioning static nodes.
CHANGES
- Upgrade Slurm to version
22.05.8. - Upgrade EFA installer to 1.21.0`
- Efa-driver:
efa-2.1.1-1 - Efa-config:
efa-config-1.12-1 - Efa-profile:
efa-profile-1.5-1 - Libfabric-aws:
libfabric-aws-1.16.1amzn3.0-1 - Rdma-core:
rdma-core-43.0-1 - Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
- Make Slurm controller logs more verbose and enable additional logging for the Slurm power save plugin.
BUG FIXES
- Fix an issue where custom AMI creation failed in Ubuntu 20.04 on MySQL packages installation.
AWS ParallelCluster v3.4.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.4.1
This is associated with AWS ParallelCluster v3.4.1
BUG FIXES
- Fix an issue with the Slurm scheduler that might incorrectly apply updates to its internal registry of compute nodes. This might result in EC2 instances to become inaccessible or backed by an incorrect instance type.
AWS ParallelCluster v3.4.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.4.0
This is associated with AWS ParallelCluster v3.4.0
ENHANCEMENTS
- Add support for specifying multiple subnets for each queue to increase the EC2 capacity pool available for use.
CHANGES
- Upgrade EFA installer to
1.20.0- Efa-driver:
efa-2.1 - Efa-config:
efa-config-1.11-1 - Efa-profile:
efa-profile-1.5-1 - Libfabric-aws:
libfabric-aws-1.16.1 - Rdma-core:
rdma-core-43.0-2 - Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
- Mount EFS file systems using
amazon-efs-utils. EFS files systems can be mounted using in-transit encryption and IAM authorized user. - Install
stunnel5.67 on CentOS7 and Ubuntu to support EFS in-transit encryption. - Add possibility to execute a custom script in the head node during the update of the cluster.
- Upgrade Slurm to version 22.05.6.
- Upgrade Python to 3.9.16 and 3.7.16.
AWS ParallelCluster v2.11.9
We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.9
This is associated with AWS ParallelCluster v2.11.9
CHANGES
- There were no notable changes for this version.
AWS ParallelCluster v3.3.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.3.1
This is associated with AWS ParallelCluster v3.3.1
CHANGES
- There were no changes for this version.
AWS ParallelCluster v3.1.5
We're excited to announce the release of AWS ParallelCluster Cookbook 3.1.5
This is associated with AWS ParallelCluster v3.1.5
CHANGES
- Upgrade EFA installer to
1.18.0- Efa-driver:
efa-1.16.0-1 - Efa-config:
efa-config-1.11-1 - Efa-profile:
efa-profile-1.5-1 - Libfabric-aws:
libfabric-aws-1.16.0~amzn4.0-1 - Rdma-core:
rdma-core-41.0-2 - Open MPI:
openmpi40-aws-4.1.4-2
- Efa-driver:
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
BUG FIXES
- Fix Slurm issue that prevents idle nodes termination.
AWS ParallelCluster v2.11.8
We're excited to announce the release of AWS ParallelCluster Cookbook 2.11.8
This is associated with AWS ParallelCluster v2.11.8
CHANGES
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade EFA installer to
1.19.0- Efa-driver:
efa-1.16.0-1 - Efa-config:
efa-config-1.11-1 - Efa-profile:
efa-profile-1.5-1 - Libfabric-aws:
libfabric-aws-1.16.0-1 - Rdma-core:
rdma-core-41.0-2 - Open MPI:
openmpi40-aws-4.1.4-3
- Efa-driver:
AWS ParallelCluster v3.3.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.3.0
This is associated with AWS ParallelCluster v3.3.0
ENHANCEMENTS
- Add support for Slurm Accounting.
- Add support for adding and removing shared storages at cluster update.
- Add possibility to specify multiple instance types for the same compute resource.
- Configure NFS threads to be
min(256, max(8, num_cores * 4))to ensure better stability and performance. - Move NFS installation at build time to reduce configuration time.
CHANGES
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
- Upgrade NVIDIA CUDA Toolkit to version 11.7.1.
- Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
- Reduce timeout from 50 to a maximum of 5min in case of DynamoDB connection issues at compute node bootstrap.
- Change the logic to number the routing tables when an instance have multiple NICs.
- Upgrade Python from 3.7.13 to 3.9.15.
- Upgrade Slurm to version 22.05.5.
- Upgrade EFA installer to
1.18.0.- Efa-driver:
efa-1.16.0-1 - Efa-config:
efa-config-1.11-1 - Efa-profile:
efa-profile-1.5-1 - Libfabric-aws:
libfabric-aws-1.16.0~amzn4.0-1 - Rdma-core:
rdma-core-41.0-2 - Open MPI:
openmpi40-aws-4.1.4-2
- Efa-driver:
- Upgrade NICE DCV to version
2022.1-13300.- server:
2022.1.13300-1 - xdcv:
2022.1.433-1 - gl:
2022.1.973-1 - web_viewer:
2022.1.13300-1
- server:
- Upgrade third-party cookbook dependencies:
- selinux-6.0.5 (from selinux-6.0.4)
- nfs-5.0.0 (from nfs-2.6.4)
AWS ParallelCluster v3.2.1
We're excited to announce the release of AWS ParallelCluster Cookbook 3.2.1
This is associated with AWS ParallelCluster v3.2.1
ENHANCEMENTS
- Improve the logic to associate the host routing tables to the different network cards to better support EC2 instances with several NICs.
CHANGES
- Upgrade NVIDIA driver to version 470.141.03.
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
- Pin cfn-bootstrap helper package version to 2.0-10
- Disable cron job tasks man-db and mlocate, which may have a negative impact on node performance.
- Upgrade Intel MPI Library to 2021.6.0.602.
- Upgrade Python from 3.7.10 to 3.7.13 in response to this security risk.
AWS ParallelCluster v3.2.0
We're excited to announce the release of AWS ParallelCluster Cookbook 3.2.0
This is associated with AWS ParallelCluster v3.2.0
ENHANCEMENTS
- Add support for multiple Elastic File Systems.
- Add support for multiple FSx File System.
- Add support for attaching existing FSx for Ontap and FSx for OpenZFS File Systems.
- Install NVIDIA GDRCopy 2.3 to enable low-latency GPU memory copy on supported instance types.
- During cluster update set Slurm nodes state accordingly to strategy set through the configuration parameter
Scheduling/SchedulerSettings/QueueUpdateStrategy. - Add support for memory-based scheduling in Slurm.
- Configure
RealMemoryon compute nodes by default as 95% of the EC2 memory. - Move
SelectTypeParameterstoslurm_parallelcluster.confinclude file. - Move
ConstrainRAMSpacetoslurm_parallelcluster_cgroup.confinclude file. - Add support for new configuration parameter
Scheduling/SlurmSettings/EnableMemoryBasedSchedulingto configure memory-based scheduling in Slurm. - Add support for new configuration parameter
Scheduling/SlurmQueues/ComputeResources/SchedulableMemoryto override default value of the memory seen by the scheduler on compute nodes.
- Configure
- Add support for rebooting compute nodes via Slurm.
CHANGES
- Restart
clustermgtdandslurmctlddaemons at cluster update time only whenSchedulingparameters are updated in the cluster configuration. - Update slurmctld and slurmd systemd service files.
- Upgrade NICE DCV to version 2022.0-12760.
- Upgrade NVIDIA driver to version 470.129.06.
- Upgrade NVIDIA Fabric Manager to version 470.129.06.
- Upgrade EFA installer to version 1.17.2.
- EFA driver:
efa-1.16.0-1 - EFA configuration:
efa-config-1.10-1 - EFA profile:
efa-profile-1.5-1 - Libfabric:
libfabric-aws-1.16.0~amzn2.0-1 - RDMA core:
rdma-core-41.0-2 - Open MPI:
openmpi40-aws-4.1.4-2
- EFA driver:
- Restrict IPv6 access to IMDS to root and cluster admin users only, when configuration parameter
HeadNode/Imds/Securedis enabled. - Set Slurm configuration
AuthInfo=cred_expire=70to reduce the time requeued jobs must wait before starting again when nodes are not available. - Move
SelectTypeParametersandConstrainRAMSpaceto theparallelcluster_slurm*.confinclude files. - Upgrade third-party cookbook dependencies:
- apt-7.4.2 (from apt-7.4.0)
- line-4.5.2 (from line-4.0.1)
- openssh-2.10.3 (from openssh-2.9.1)
- pyenv-3.5.1 (from pyenv-3.4.2)
- selinux-6.0.4 (from selinux-3.1.1)
- yum-7.4.0 (from yum-6.1.1)
- yum-epel-4.5.0 (from yum-epel-4.1.2)
- Disable
aws-ubuntu-eni-helperservice, available in Deep Learning AMIs, to avoid conflicts withconfigure_nw_interface.shwhen configuring instances with multiple network cards. - Set MTU to 9001 for all the network interfaces when configuring instances with multiple network cards.
- Remove the trailing dot when configuring the compute node FQDN.