Skip to content

Commit 16a5239

Browse files
Merge branch 'develop' into develop
2 parents b5b723d + e49c40e commit 16a5239

File tree

207 files changed

+15635
-3874
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

207 files changed

+15635
-3874
lines changed

.github/workflows/ci.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,13 @@ jobs:
9494
run: pip install tox
9595
- name: Run Tox
9696
run: cd ${{ matrix.toxdir }} && tox -e ${{ matrix.toxenv }}
97+
- name: Upload code coverage report to Codecov
98+
uses: codecov/codecov-action@v3
99+
if: ${{ endsWith(matrix.toxenv, '-cov') }}
100+
with:
101+
files: cli/coverage.xml
102+
flags: unittests
103+
verbose: true
97104
awsbatch-cli-tests:
98105
name: AWS Batch CLI Tests
99106
runs-on: ${{ matrix.os }}
@@ -169,7 +176,7 @@ jobs:
169176
runs-on: ubuntu-latest
170177
steps:
171178
- uses: actions/checkout@v2
172-
- uses: mikefarah/yq@v4.6.3
179+
- uses: mikefarah/yq@v4.32.2
173180
- run: api/docker/awslambda/docker-build.sh
174181
shellcheck:
175182
name: Shellcheck

.github/workflows/codeql-analysis.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ jobs:
2222
- name: Checkout repository
2323
uses: actions/checkout@v2
2424
- name: Initialize CodeQL
25-
uses: github/codeql-action/init@v1
25+
uses: github/codeql-action/init@v2
2626
with:
2727
languages: ${{ matrix.language }}
2828
queries: +security-and-quality
2929
- name: Perform CodeQL Analysis
30-
uses: github/codeql-action/analyze@v1
30+
uses: github/codeql-action/analyze@v2

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ report.html
1717
tests_outputs/
1818
.python-version
1919
test.yaml
20+
.vscode

.pre-commit-config.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ repos:
1616
- id: check-symlinks
1717
- id: end-of-file-fixer
1818
- id: pretty-format-json
19+
args: ['--autofix']
1920
- id: requirements-txt-fixer
2021
- id: mixed-line-ending
2122
args: ['--fix=no']

CHANGELOG.md

Lines changed: 64 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,58 @@ CHANGELOG
44
3.6.0
55
----
66
**ENHANCEMENTS**
7+
- Add a CloudFormation custom resource for creating and managing clusters from CloudFormation.
78
- Add `mem_used_percent` and `disk_used_percent` metrics for head node memory and root volume disk utilization tracking on the ParallelCluster CloudWatch dashboard, and set up alarms for monitoring these metrics.
8-
9-
**ENHANCEMENTS**
109
- Add log rotation support for ParallelCluster managed logs.
10+
- Track common errors of compute nodes on Cloudwatch Dashboard.
11+
- Increase the limit on the maximum number of queues per cluster from 10 to 100. Each cluster can however have a maximum number of 150 compute resources and each queue can have a maximum of 40 compute resources.
12+
- Allow to specify a sequence of multiple custom actions scripts per event.
13+
- Add support for customizing the cluster Slurm configuration via the ParallelCluster configuration YAML file.
14+
- Track the longest dynamic node idle time in CloudWatch Dashboard.
15+
- Add new configuration section `HealthChecks/Gpu` for enabling the GPU Health Check in the compute node before job execution.
16+
- Add support for `DetailedMonitoring` in the `Monitoring` section.
17+
- Add support for `Tags` in the `SlurmQueues` and `SlurmQueues/ComputeResources` section.
18+
- Build Slurm with support for LUA.
1119

1220
**CHANGES**
1321
- Increase the default `RetentionInDays` of CloudWatch logs from 14 to 180 days.
22+
- Set Slurm prolog and epilog configurations to target a directory, /opt/slurm/etc/scripts/prolog.d/ and /opt/slurm/etc/scripts/epilog.d/ respectively.
23+
- Upgrade Slurm to version 23.02.1.
24+
- Upgrade munge to version 0.5.15.
25+
- Upgrade image used by CodeBuild environment when building container images for AWS Batch clusters, from
26+
`aws/codebuild/amazonlinux2-x86_64-standard:3.0` to `aws/codebuild/amazonlinux2-x86_64-standard:4.0` and from
27+
`aws/codebuild/amazonlinux2-aarch64-standard:1.0` to `aws/codebuild/amazonlinux2-aarch64-standard:2.0`.
1428

1529
**BUG FIXES**
1630
- Fix EFS, FSx network security groups validators to avoid reporting false errors.
31+
- Fix missing tagging of resources created by ImageBuilder during the `build-image` operation.
32+
- Fix Update policy for MaxCount to always perform numerical comparisons on MaxCount property.
33+
- Fix IP association on instances with multiple network cards.
34+
- Fix replacement of StoragePass in slurm_parallelcluster_slurmdbd.conf when a queue parameter update is performed and the Slurm accounting configurations are not updated.
35+
36+
3.5.1
37+
-----
38+
**ENHANCEMENTS**
39+
- Add a new way to distribute ParallelCluster as a self-contained executable shipped with a dedicated installer.
40+
- Add support for US isolated region us-isob-east-1.
41+
42+
**CHANGES**
43+
- Upgrade EFA installer to `1.22.0`
44+
- Efa-driver: `efa-2.1.1g`
45+
- Efa-config: `efa-config-1.13-1`
46+
- Efa-profile: `efa-profile-1.5-1`
47+
- Libfabric-aws: `libfabric-aws-1.17.0-1`
48+
- Rdma-core: `rdma-core-43.0-1`
49+
- Open MPI: `openmpi40-aws-4.1.5-1`
50+
- Upgrade NICE DCV to version `2022.2-14521`.
51+
- server: `2022.2.14521-1`
52+
- xdcv: `2022.2.519-1`
53+
- gl: `2022.2.1012-1`
54+
- web_viewer: `2022.2.14521-1`
55+
56+
**BUG FIXES**
57+
- Fix update cluster to remove shared EBS volumes can potentially cause node launching failures if `MountDir` match the same pattern in `/etc/exports`.
58+
- Fix for compute_console_output log file being truncated at every clustermgtd iteration.
1759

1860
3.5.0
1961
-----
@@ -23,7 +65,6 @@ CHANGELOG
2365
- Add a Python library to allow customers to use ParallelCluster functionalities in their own code.
2466
- Add logging of compute node console output to CloudWatch on compute node bootstrap failure.
2567
- Add failures field containing failure code and reason to `describe-cluster` output when cluster creation fails.
26-
- Add support for US isolated regions: us-iso-* and us-isob-*.
2768

2869
**CHANGES**
2970
- Upgrade Slurm to version 22.05.8.
@@ -204,6 +245,25 @@ CHANGELOG
204245
- Fix ParallelCluster API stack update failure when upgrading from a previus version. Add resource pattern used for the `ListImagePipelineImages` action in the `EcrImageDeletionLambdaRole`.
205246
- Fix ParallelCluster API adding missing permissions needed to import/export from S3 when creating an FSx for Lustre storage.
206247

248+
3.1.5
249+
------
250+
251+
**CHANGES**
252+
- Upgrade EFA installer to `1.18.0`
253+
- Efa-driver: `efa-1.16.0-1`
254+
- Efa-config: `efa-config-1.11-1`
255+
- Efa-profile: `efa-profile-1.5-1`
256+
- Libfabric-aws: `libfabric-aws-1.16.0~amzn4.0-1`
257+
- Rdma-core: `rdma-core-41.0-2`
258+
- Open MPI: `openmpi40-aws-4.1.4-2`
259+
- Add `lambda:ListTags` and `lambda:UntagResource` to `ParallelClusterUserRole` used by ParallelCluster API stack for cluster update.
260+
- Upgrade Intel MPI Library to 2021.6.0.602.
261+
- Upgrade NVIDIA driver to version 470.141.03.
262+
- Upgrade NVIDIA Fabric Manager to version 470.141.03.
263+
264+
**BUG FIXES**
265+
- Fix Slurm issue that prevents idle nodes termination.
266+
207267
3.1.4
208268
------
209269

@@ -680,7 +740,7 @@ CHANGELOG
680740
- Improve retrieval of instance type info by using `DescribeInstanceType` API.
681741
- Remove `custom_awsbatch_template_url` configuration parameter.
682742
- Upgrade `pip` to latest version in virtual environments.
683-
- Upgrade image used by CodeBuild environment when building container images for Batch clusters, from
743+
- Upgrade image used by CodeBuild environment when building container images for AWS Batch clusters, from
684744
`aws/codebuild/amazonlinux2-x86_64-standard:1.0` to `aws/codebuild/amazonlinux2-x86_64-standard:3.0`.
685745

686746
**BUG FIXES**

api/README.md

Lines changed: 5 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -91,98 +91,11 @@ correctness of the API model evey time a PR is opened.
9191
The ParallelCluster OpenAPI Generator workflow (`workdlows/openapi_generator.yml`) defines a `generate-openapi-model`
9292
build step that automatically adds to the PR the generated OpenAPI model in case this was not included in the commit.
9393

94-
## Packaging the API as an AWS Lambda container
94+
## Testing
9595

96-
The `docker/awslambda` directory contains the definition of a Dockerfile that is used to package the ParallelCluster
97-
API as an AWS Lambda function. Running the `docker/awslambda/docker-build.sh` script will produce a `pcluster-lambda`
98-
Docker container that packages and exposes the ParallelCluster API in a format which is compatible with the AWS Lambda runtime.
96+
The API is a facade ontop of the controllers (as well as the CLI) so much of the underlying functionality can be tested
97+
through unit tests and integration tests that exercise the operations.
9998

100-
### Running Testing and Debugging the API locally
99+
In order to test the API specifically, there are integraiton tests which will deploy the API and test the functionality using
100+
the generated client.
101101

102-
Once the Docker image has been successfully built you have the following options:
103-
104-
#### Run a shell in the container
105-
Use the following to run a shell in the container: `docker run -it --entrypoint /bin/bash pcluster-lambda`.
106-
107-
This is particularly useful to debug issues with the container runtime.
108-
109-
#### Run a local AWS Lambda endpoint
110-
Use the following to run a local AWS Lambda endpoint hosting the API: `docker run -e POWERTOOLS_TRACE_DISABLED=1 -e AWS_REGION=eu-west-1 -p 9000:8080 pcluster-lambda`
111-
112-
Then you can use the following to send requests to the local endpoint:
113-
`curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d @docker/awslambda/test-events/event.json`
114-
115-
This is useful to test the integration with AWS Lambda.
116-
117-
#### Run the Flask development server
118-
Use the following to run a local Flask development server hosting the API: `docker run -p 8080:8080 --entrypoint python pcluster-lambda -m pcluster.api.flask_app`
119-
120-
Then you can navigate to the following url to test the API: `http://0.0.0.0:8080/ui`
121-
Note that to enable swagger-ui you have to build the docker with `--build-arg PROFILE=dev`.
122-
123-
This is particularly useful to ignore the AWS Lambda layer and directly hit the Flask application with plain HTTP requests.
124-
An even simpler way to do this which also offers live reloading of the API code, is to just ignore the Docker container
125-
and run a local Flask server on your host by executing `cd ../cli/src && python -m pcluster.api.flask_app`
126-
127-
## Deploy the API test infrastructure with SAM cli (API Gateway + Lambda)
128-
The Serverless Application Model Command Line Interface (SAM CLI) is an extension of the AWS CLI that adds functionality
129-
for building and testing Lambda applications. It uses Docker to run your functions in an Amazon Linux environment that
130-
matches Lambda. It can also emulate your application's build environment and API.
131-
132-
To use the SAM CLI, you need the following tools.
133-
134-
* SAM CLI - [Install the SAM CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html)
135-
* Docker - [Install Docker community edition](https://hub.docker.com/search/?type=edition&offering=community)
136-
137-
You may need the following for local testing.
138-
* [Python 3 installed](https://www.python.org/downloads/)
139-
140-
The `docker/awslambda/sam` directory contains a sample [SAM](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/what-is-sam.html)
141-
template that can be used to test the ParallelCluster API.
142-
143-
### Run a local AWS APIGateway endpoint with SAM
144-
The SAM template can be used together with the SAM CLI to locally test the ParallelCluster API as if it were hosted
145-
behind an API Gateway endpoint.
146-
147-
To do so move to the `docker/awslambda/sam` directory and run:
148-
149-
```bash
150-
sam build
151-
sam local start-api
152-
```
153-
154-
To only invoke the AWS Lambda function locally you can run:
155-
```bash
156-
sam build
157-
sam local invoke ParallelClusterFunction --event ../test-events/event.json
158-
```
159-
160-
For further details and
161-
to review all the testing features available through SAM please refer to the official
162-
[SAM docs](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-test-and-debug.html).
163-
164-
### Deploy the API test infrastructure
165-
To build and deploy your application for the first time, run the following in your shell:
166-
167-
```bash
168-
sam build
169-
sam deploy --guided
170-
```
171-
172-
The first command will build a docker image from a Dockerfile and then copy the source of your application inside the Docker image.
173-
The second command will package and deploy your application to AWS, with a series of prompts.
174-
175-
#### Fetch, tail, and filter Lambda function logs
176-
177-
To simplify troubleshooting, SAM CLI has a command called `sam logs`. `sam logs` lets you fetch logs generated by your
178-
deployed Lambda function from the command line. In addition to printing the logs on the terminal, this command has
179-
several nifty features to help you quickly find the bug.
180-
181-
NOTE: This command works for all AWS Lambda functions; not just the ones you deploy using SAM.
182-
183-
```bash
184-
sam logs -n ParallelClusterFunction --stack-name pcluster-lambda --tail
185-
```
186-
187-
You can find more information and examples about filtering Lambda function logs in the
188-
[SAM CLI Documentation](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-logging.html).

api/client/patch-client.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77
# OR CONDITIONS OF ANY KIND, express or implied. See the License for the specific language governing permissions and
88
# limitations under the License.
99

10+
set -ex
11+
1012
cp client/resources/sigv4_auth.py client/src/pcluster_client
1113
patch -u -N client/src/pcluster_client/api_client.py < client/resources/api_client.py.patch
1214
patch -u -N client/src/requirements.txt < client/resources/client-requirements.txt.patch

api/client/resources/api_client.py.patch

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@
88

99

1010
class ApiClient(object):
11-
@@ -603,6 +604,9 @@
12-
if not auth_settings:
11+
@@ -633,6 +634,9 @@ class ApiClient(object):
12+
headers, queries, resource_path, method, body, auth_setting)
1313
return
14-
14+
1515
+ if 'aws.auth.sigv4' in auth_settings:
1616
+ sigv4_auth(method, self.configuration.host, resource_path, queries, body, headers)
1717
+

api/docker/awslambda/sam/template.yaml

Lines changed: 0 additions & 72 deletions
This file was deleted.

0 commit comments

Comments
 (0)