Skip to content

Conversation

@gmarciani
Copy link
Contributor

@gmarciani gmarciani commented Oct 28, 2025

Description of changes

This PR mitigates the performance degradation reported in #6449

Add validator ExtraChefAttributesValidator to warn the user against the consequences of disabling in-place updates through DevSettings, i.e. when the chef attribute in_place_update_on_fleet_enabled is set to False.

We introduced such new attribute in aws/aws-parallelcluster-cookbook#3040
See that PR for details about what the cookbook does when cfnhup is disabled.
Such attribute takes effect at config time (cluster creation/update, not at build image).

The validator does what follows:

  1. validates that the attribute cluster.in_place_update_on_fleet_enabled is a boolean value.
  2. if the attribute cluster.in_place_update_on_fleet_enabled is false, returns a warning to notify the user that disabling cfn-hup has consequences on the cluster update behavior.

User Experience

[UseCase 1] in-place updates enabled (implicitly)

By omitting the chef attribute in_place_update_on_fleet_enabled, cfn-hup is enabled. No warning are returned by the CLI

[UseCase 2] in-place updates enabled (explicitly)

By setting in_place_update_on_fleet_enabled to True, cfn-hup is enabled. No warning are returned by the CLI.

Config:

DevSettings:
  Cookbook:
    ExtraChefAttributes: |
      { "cluster" : { "in_place_update_on_fleet_enabled" : "true" } }

[UseCase 3] in-place updates disabled

When the cluster is created with the following attribute, cluster creation/update succeeds with a warning:

Config:

DevSettings:
  Cookbook:
    ExtraChefAttributes: |
      { "cluster" : { "in_place_update_on_fleet_enabled" : "false" } }

Warning:

    {
      "level": "WARNING",
      "type": "ExtraChefAttributesValidator",
      "message": "When in-place updates are disabled, cluster updates are applied by replacing compute and login nodes according to the selected QueueUpdateStrategy."
    }

[UseCase 4]: invalid value for the attribute

When the user provides an invalid value for in_place_update_on_fleet_enabled (i.e.: not boolean), cluster creation/update fails with error.

Config:

DevSettings:
    ExtraChefAttributes: |
      { "cluster" : { "in_place_update_on_fleet_enabled" : "XXXX" } }

Error:

    {
      "level": "ERROR",
      "type": "ExtraChefAttributesValidator",
      "message": "Invalid value in DevSettings/Cookbook/ExtraChefAttributes: attribute 'in_place_update_on_fleet_enabled' must be a boolean value."
    }

[UseCase 5]: invalid value for ExtraChefAttributes

When the user provides an invalid JSON for ExtraChefAttributes, cluster creation/update fails with a schema validation error (existing logic, mentioning for completeness).

Config:

DevSettings:
  Cookbook:
    ExtraChefAttributes: |
      {INVALID JSON}
{
  "configurationValidationErrors": [
    {
      "level": "ERROR",
      "type": "ConfigSchemaValidator",
      "message": "[('DevSettings', {'Cookbook': {'ExtraChefAttributes': [\"'{INVALID JSON}\\n' is invalid\"]}})]"
    }
  ],
  "message": "Invalid cluster configuration."
}

Tests

  • Unit tests (existing and new ones)
  • Manually validated all the use cases reported in User Experience.

Q&A

  1. Why not testing the update of the new attribute?
    Updates to ExtraChefAttributes have never been supported as per update policy here

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani added the 3.x label Oct 28, 2025
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/performance-disable-cfnhup-on-compute-nodes-1024-1 branch 2 times, most recently from 44c9bac to ca1e0be Compare October 28, 2025 20:23
@gmarciani gmarciani marked this pull request as ready for review October 28, 2025 22:19
@gmarciani gmarciani requested review from a team as code owners October 28, 2025 22:19
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/performance-disable-cfnhup-on-compute-nodes-1024-1 branch 3 times, most recently from b626d80 to e2eddd0 Compare October 29, 2025 17:45
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/performance-disable-cfnhup-on-compute-nodes-1024-1 branch 3 times, most recently from ef8d27a to 65c2979 Compare October 29, 2025 18:10
@gmarciani gmarciani changed the title [Validators] Add validator ExtraChefAttributesValidator to validate format of ExtraChefAttributes and notify the user about downsides of cfnhup_on_fleet_enabled=False [Validators] Add validator ExtraChefAttributesValidator to validate format of ExtraChefAttributes and notify the user about downsides of in_place_update_on_fleet_enabled=False Oct 29, 2025
@gmarciani gmarciani changed the title [Validators] Add validator ExtraChefAttributesValidator to validate format of ExtraChefAttributes and notify the user about downsides of in_place_update_on_fleet_enabled=False [Validators] Add validator ExtraChefAttributesValidator to validate format of ExtraChefAttributes and notify the user about downsides of in_place_update_on_fleet_enabled=False Oct 29, 2025
… DevSettings/Cookbook/ExtraChefAttributes.

It fails with an error if the attribute `cluster.in_place_update_on_fleet_enabled` is not a boolean value.

It fails with a warning if the attribute `cluster.in_place_update_on_fleet_enabled` is false, notifying the user about the consequences of disabling in-place updates.
@gmarciani gmarciani force-pushed the wip/mgiacomo/3150/performance-disable-cfnhup-on-compute-nodes-1024-1 branch from 65c2979 to 8960cc4 Compare October 29, 2025 19:16
@himani2411
Copy link
Contributor

Have we tested this use-case? where we create a cluster with in_place_update_on_fleet_enabled as false and then try to update the cluster to use in_place_update_on_fleet_enabled as true?

@gmarciani
Copy link
Contributor Author

Have we tested this use-case? where we create a cluster with in_place_update_on_fleet_enabled as false and then try to update the cluster to use in_place_update_on_fleet_enabled as true?

Updates to ExtraChefAttributes have never been supported as per update policy here

Added to Q&A section.

Returns:
True if value is "true" or "false" (case-insensitive), False otherwise.
"""
return str(value).lower() in BOOLEAN_VALUES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Non-blocking] Seems redundant to convert a string to a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is to protect against None values.

If you remove the string conversion and pass None as input, the function would throw the error AttributeError: 'NoneType' object has no attribute 'lower'

Copy link
Contributor

@himani2411 himani2411 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gmarciani gmarciani merged commit b27bc19 into aws:develop Oct 30, 2025
24 checks passed
@gmarciani gmarciani deleted the wip/mgiacomo/3150/performance-disable-cfnhup-on-compute-nodes-1024-1 branch October 30, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants