-
Couldn't load subscription status.
- Fork 1.2k
fix: include zero value local_ssd_ephemeral_storage_count for GKE nodepool disktype=hyperdisk-balanced… #2474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… nodepool machine_type
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello @shlomimn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where Terraform would incorrectly attempt to replace GKE nodepools configured with Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an issue where Terraform produces a perpetual diff for GKE node pools using hyperdisk-balanced disk type, because the GKE API expects the ephemeralStorageLocalSsdConfig block even with zero counts. While the intent is correct, the implementation has a couple of issues. First, the change is made to a generated file (modules/beta-public-cluster/cluster.tf) and should be applied to the source template (autogen/main/cluster.tf.tmpl) to be effective and persistent. Second, the proposed logic >= 0 is too broad and will add the ephemeral_storage_local_ssd_config block to all node pools, which could cause issues for configurations that do not support it. I've suggested a more targeted fix that applies the logic only when necessary. It would also be beneficial to add an integration test to cover this new scenario.
|
|
||
| dynamic "ephemeral_storage_local_ssd_config" { | ||
| for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] | ||
| for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) >= 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should be applied to the template file autogen/main/cluster.tf.tmpl instead of this generated file. Otherwise, it will be overwritten.
Additionally, the condition lookup(each.value, "local_ssd_ephemeral_storage_count", 0) >= 0 is too broad. Since lookup with a default value of 0 will always return a number greater than or equal to zero, this will cause the ephemeral_storage_local_ssd_config block to be created for all node pools, which may have unintended consequences for node pools that do not support this configuration.
Based on your description, the issue is specific to disk_type = "hyperdisk-balanced". A safer approach is to add this specific case to the condition, preserving the original logic for other cases.
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 || lookup(each.value, "disk_type", "pd-standard") == "hyperdisk-balanced" ? [1] : []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the suggested answer,
Since we are dealing with ssd and since it is suggested my solution is too broad,
the I think we can add the following change instead:
dynamic "ephemeral_storage_local_ssd_config" {
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) || lookup(each.value, "disk_type", "pd-ssd") == "hyperdisk-balanced" > 0 ? [1] : []
content {
local_ssd_count = lookup(each.value, "local_ssd_ephemeral_storage_count", 0)
data_cache_count = lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0)
}
}
…eation related to dick_type=hyperdisk-balanced to narrow the solution scope
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix an issue where Terraform plans to replace a GKE node pool with disk_type = "hyperdisk-balanced" because the ephemeral_storage_local_ssd_config block is missing. The change correctly adds a condition to include this block when using hyperdisk-balanced.
However, there are a few critical issues with the current implementation:
-
Modification of a generated file: The change is made directly in
modules/beta-private-cluster/cluster.tf, which is an auto-generated file. The file header clearly states:// This file was automatically generated from a template in ./autogen/main. Any changes should be made to the template file,autogen/main/cluster.tf.tmpl, and then thecluster.tffile should be regenerated. Otherwise, your changes will be overwritten in the future. -
Incomplete fix: The fix has only been applied to the
google_container_node_pool.windows_poolsresource. The same issue exists for Linux node pools defined in thegoogle_container_node_pool.poolsresource in the same file. Applying the fix to the template file (autogen/main/cluster.tf.tmpl) would resolve this for both Windows and Linux node pools.
I have also added a specific comment regarding an inconsistency in your code change.
|
|
||
| dynamic "ephemeral_storage_local_ssd_config" { | ||
| for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] | ||
| for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 || lookup(each.value, "disk_type", "pd-ssd") == "hyperdisk-balanced" ? [1] : [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for disk_type in this lookup is "pd-ssd", which is inconsistent with the default value "pd-standard" used when setting the disk_type attribute on line 1315. To ensure consistency and prevent potential issues if the default value changes in the future, it's best to use the same default in both places.
for_each = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0 || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 || lookup(each.value, "disk_type", "pd-standard") == "hyperdisk-balanced" ? [1] : []
Preview
I recently tried to create a new nodepool with machine_type=c4-highcpu-4 using a disk_type=hyperdisk-balanced
My GCP managed cluster controlplane version=1.32.8-gke.1134000
I am using terraform to add the new GKE nodepool.
Since my nodepool uses disk_type=hyperdisk-balanced (and not disk_type=pd-ssd),
the GKE API always assumes that the node pool might want ephemeral storage (since this is typically chosen for high-performance workloads), so it always exposes ephemeralStorageLocalSsdConfig: {} even if local_ssd_count = 0.
Problem
Following the preview above, the API wants to add ephemeralStorageLocalSsdConfig: {} when creating a new nodepool with disk_type=hyperdisk-balanced but then the GKE code checks whether local_ssd_count exists.
Since local_ssd_count doesn't exist in my nodepool configuration then terraform wants to omit ephemeralStorageLocalSsdConfig: {}
Relevant Code in cluster.tf
https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/blob/main/modules/beta-private-cluster/cluster.tf
dynamic "ephemeral_storage_local_ssd_config" { for_each = **lookup(each.value, "local_ssd_ephemeral_storage_count", 0) > 0** || lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) > 0 ? [1] : [] content { local_ssd_count = lookup(each.value, "local_ssd_ephemeral_storage_count", 0) data_cache_count = lookup(each.value, "ephemeral_storage_local_ssd_data_cache_count", 0) } }FInal Result
Finally the terraform codes shows: Plan: 1 to add, 0 to change, 1 to destroy.
`# module.gke.google_container_cluster.primary has changed ~ resource "google_container_cluster" "primary" {
...
...
...
...
Terraform will perform the following actions: # module.gke.google_container_node_pool.pools["event-proc"] must be replaced -/+ resource "google_container_node_pool" "pools" {
...
...
Suggested Solution
If add the option in the code not to remove dynamic "ephemeral_storage_local_ssd_config"
In case local_ssd_count=0 this will align with the API exposes ephemeralStorageLocalSsdConfig: {} even if local_ssd_count=0