test(robot): add test case Check If Nodes Are Under Memory Pressure After Cluster Restart #2285

yangchiu · 2025-01-21T05:55:42Z

Which issue(s) this PR fixes:

Issue longhorn/longhorn#10203

What this PR does / why we need it:

add test case Check If Nodes Are Under Memory Pressure After Cluster Restart

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

New Features
- Added a new resource file for metrics-related keywords
- Introduced methods to check node memory usage and pressure
- Implemented new test case for monitoring node memory after cluster restart
Bug Fixes
- Enhanced node metrics retrieval and condition checking mechanisms
Tests
- Added comprehensive test for verifying node memory pressure after cluster restart

…fter Cluster Restart Signed-off-by: Yang Chiu <[email protected]>

coderabbitai · 2025-01-21T05:55:49Z

Walkthrough

This pull request introduces a comprehensive set of changes focused on monitoring node memory metrics and pressure in a Kubernetes cluster. A new resource file metrics.resource is created along with supporting libraries in metrics_keywords.py and metrics.py. The Node class in node.py is extended with methods to retrieve node memory and condition information. A new test case in cluster_restart.robot is added to validate node memory pressure after cluster restart, enhancing the system's ability to detect and respond to memory-related issues.

Changes

File	Change Summary
`e2e/keywords/metrics.resource`	New resource file with keyword for checking node memory pressure
`e2e/libs/keywords/metrics_keywords.py`	Added `metrics_keywords` class with methods for retrieving and checking node memory metrics
`e2e/libs/metrics/metrics.py`	New `get_node_metrics` function to retrieve node metrics from Kubernetes API
`e2e/libs/node/node.py`	Added methods `get_node_total_memory` and `get_node_condition` to `Node` class
`e2e/tests/negative/cluster_restart.robot`	New test case "Check If Nodes Are Under Memory Pressure After Cluster Restart"

Assessment against linked issues

Objective	Addressed	Explanation
Automate e2e test for instability after power failure [#10203]	✅

Possibly related PRs

fix(node): cleanup node exec in test case teardown #2157: Enhancements to the Node class functionality
test(robot): add block disk back to previously deleted node for v2 test cases #2190: Node and disk management improvements
test(robot): add node down during migration test cases #2208: Migration scenario testing involving nodes
test(robot): add heavy writing during migration test cases #2277: Migration tests with heavy writing scenarios

Suggested reviewers

chriscchien

Poem

🐰 Memory metrics hop and dance,
Nodes under pressure, take a glance!
Clusters restart, but we're aware,
Of every byte and memory's flair.
Testing resilience with rabbit's might,
Ensuring systems stay just right! 🚀

✨ Finishing Touches

📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

e2e/keywords/metrics.resource (1)

7-12: Add documentation for the keyword.

Add documentation to explain the purpose and behavior of the keyword.

 *** Keywords ***
 Check if nodes are under memory pressure
+    [Documentation]    Checks if any worker nodes in the cluster are under memory pressure.
+    ...    
+    ...    This keyword:
+    ...    1. Retrieves all worker nodes
+    ...    2. For each node:
+    ...       - Gets memory usage percentage
+    ...       - Checks if node is under memory pressure
+    ...    
+    ...    Fails if any node is under memory pressure.
     ${worker_nodes} =    get_worker_nodes
     FOR    ${worker_node}    IN    @{worker_nodes}
         get_node_memory_usage_in_percentage    ${worker_node}
         check_if_node_under_memory_pressure    ${worker_node}
     END

e2e/tests/negative/cluster_restart.robot (2)

83-124: Add documentation and consider test optimization.

Add documentation to explain the test's purpose and expectations
Consider parameterizing the data size (1024 MB) for flexibility

 Check If Nodes Are Under Memory Pressure After Cluster Restart
+    [Documentation]    Verifies that nodes do not experience memory pressure after cluster restart.
+    ...    
+    ...    Test Steps:
+    ...    1. Creates multiple storage classes with different configurations
+    ...    2. Creates stateful sets using these storage classes
+    ...    3. Writes ${DATA_SIZE} MB of data to each stateful set
+    ...    4. In a loop:
+    ...       - Creates backups for each volume
+    ...       - Restarts the cluster
+    ...       - Verifies workload stability
+    ...       - Checks for memory pressure
+    [Variables]    ${DATA_SIZE}=1024
     [Tags]    cluster
     Given Create storageclass longhorn-test with    dataEngine=${DATA_ENGINE}
     And Create storageclass strict-local with    numberOfReplicas=1    dataLocality=strict-local    dataEngine=${DATA_ENGINE}
     And Create storageclass nfs-4-2 with    nfsOptions=vers=4.2,noresvport,timeo=450,retrans=8    dataEngine=${DATA_ENGINE}
     And Create storageclass nfs-hard-mount with    nfsOptions=hard,timeo=50,retrans=1    dataEngine=${DATA_ENGINE}
     And Create storageclass nfs-soft-mount with    nfsOptions=soft,timeo=250,retrans=5    dataEngine=${DATA_ENGINE}
     And Create statefulset 0 using RWO volume with longhorn-test storageclass
     And Create statefulset 1 using RWX volume with longhorn-test storageclass
     And Create statefulset 2 using RWO volume with strict-local storageclass
     And Create statefulset 3 using RWX volume with nfs-4-2 storageclass
     And Create statefulset 4 using RWX volume with nfs-hard-mount storageclass
     And Create statefulset 5 using RWX volume with nfs-soft-mount storageclass
-    And Write 1024 MB data to file data.bin in statefulset 0
-    And Write 1024 MB data to file data.bin in statefulset 1
-    And Write 1024 MB data to file data.bin in statefulset 2
-    And Write 1024 MB data to file data.bin in statefulset 3
-    And Write 1024 MB data to file data.bin in statefulset 4
-    And Write 1024 MB data to file data.bin in statefulset 5
+    And Write ${DATA_SIZE} MB data to file data.bin in statefulset 0
+    And Write ${DATA_SIZE} MB data to file data.bin in statefulset 1
+    And Write ${DATA_SIZE} MB data to file data.bin in statefulset 2
+    And Write ${DATA_SIZE} MB data to file data.bin in statefulset 3
+    And Write ${DATA_SIZE} MB data to file data.bin in statefulset 4
+    And Write ${DATA_SIZE} MB data to file data.bin in statefulset 5

103-111: Consider parallelizing backup creation.

The sequential creation of backups might extend test duration unnecessarily. Consider implementing parallel backup creation if the test framework supports it.

e2e/libs/node/node.py (1)

123-125: Add error handling for non-existent nodes.

The method should handle cases where the node doesn't exist or the memory capacity information is unavailable.

 def get_node_total_memory(self, node_name):
-    node = self.get_node_by_name(node_name)
-    return node.status.capacity['memory']
+    try:
+        node = self.get_node_by_name(node_name)
+        return node.status.capacity['memory']
+    except Exception as e:
+        raise RuntimeError(f"Failed to get memory capacity for node {node_name}: {str(e)}")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 958d54b and 22c9563.

📒 Files selected for processing (5)

e2e/keywords/metrics.resource (1 hunks)
e2e/libs/keywords/metrics_keywords.py (1 hunks)
e2e/libs/metrics/metrics.py (1 hunks)
e2e/libs/node/node.py (1 hunks)
e2e/tests/negative/cluster_restart.robot (2 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

e2e/libs/node/node.py

133-133: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/metrics/metrics.py

11-11: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

22-22: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

e2e/libs/keywords/metrics_keywords.py

48-48: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

⏰ Context from checks skipped due to timeout of 90000ms (2)

GitHub Check: Build images
GitHub Check: Summary

🔇 Additional comments (1)

e2e/libs/node/node.py (1)

123-134: LGTM! Changes align well with PR objectives.

The added methods provide the necessary functionality to support testing node memory pressure after cluster restart. The implementation is clean and follows existing patterns in the codebase.

🧰 Tools

🪛 Ruff (0.8.2)

133-133: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

coderabbitai · 2025-01-21T06:05:39Z

e2e/libs/metrics/metrics.py

+def get_node_metrics(node_name, metrics_name):
+    retry_count, retry_interval = get_retry_count_and_interval()
+    for i in range(retry_count):
+        api = client.CustomObjectsApi()
+        try:
+            node_metrics = api.list_cluster_custom_object("metrics.k8s.io", "v1beta1", "nodes")
+            for node in node_metrics['items']:
+                if node_name == node['metadata']['name']:
+                    logging(f"Got node {node_name} metrics {metrics_name} = {node['usage'][metrics_name]}")
+                    return node['usage'][metrics_name]
+        except ApiException as e:
+            logging(f"Failed to get node {node_name} metrics {metrics_name}: {e}")
+        time.sleep(retry_interval)
+    assert False, f"Failed to get node {node_name} metrics {metrics_name}"


🛠️ Refactor suggestion

Add type hints, docstring and improve error handling.

The function implementation could be improved in several ways:

Add type hints for parameters and return value

Add docstring explaining the function's purpose, parameters, and return value

Replace assert False with raise AssertionError

Validate the metrics_name parameter

-def get_node_metrics(node_name, metrics_name): +def get_node_metrics(node_name: str, metrics_name: str) -> str: + """Retrieve metrics for a specified Kubernetes node. + + Args: + node_name: Name of the node to retrieve metrics for + metrics_name: Type of metrics to retrieve (e.g., 'memory', 'cpu') + + Returns: + str: The metrics value for the specified node and metrics type + + Raises: + AssertionError: If unable to retrieve metrics after retries + """ + if metrics_name not in ['memory', 'cpu']: + raise ValueError(f"Unsupported metrics type: {metrics_name}") retry_count, retry_interval = get_retry_count_and_interval() - for i in range(retry_count): + for _i in range(retry_count): api = client.CustomObjectsApi() try: node_metrics = api.list_cluster_custom_object("metrics.k8s.io", "v1beta1", "nodes") for node in node_metrics['items']: if node_name == node['metadata']['name']: logging(f"Got node {node_name} metrics {metrics_name} = {node['usage'][metrics_name]}") return node['usage'][metrics_name] except ApiException as e: logging(f"Failed to get node {node_name} metrics {metrics_name}: {e}") time.sleep(retry_interval) - assert False, f"Failed to get node {node_name} metrics {metrics_name}" + raise AssertionError(f"Failed to get node {node_name} metrics {metrics_name}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_node_metrics(node_name, metrics_name):

retry_count, retry_interval = get_retry_count_and_interval()

for i in range(retry_count):

api = client.CustomObjectsApi()

try:

node_metrics = api.list_cluster_custom_object("metrics.k8s.io", "v1beta1", "nodes")

for node in node_metrics['items']:

if node_name == node['metadata']['name']:

logging(f"Got node {node_name} metrics {metrics_name} = {node['usage'][metrics_name]}")

return node['usage'][metrics_name]

except ApiException as e:

logging(f"Failed to get node {node_name} metrics {metrics_name}: {e}")

time.sleep(retry_interval)

assert False, f"Failed to get node {node_name} metrics {metrics_name}"

def get_node_metrics(node_name: str, metrics_name: str) -> str:

"""Retrieve metrics for a specified Kubernetes node.

Args:

node_name: Name of the node to retrieve metrics for

metrics_name: Type of metrics to retrieve (e.g., 'memory', 'cpu')

Returns:

str: The metrics value for the specified node and metrics type

Raises:

AssertionError: If unable to retrieve metrics after retries

"""

if metrics_name not in ['memory', 'cpu']:

raise ValueError(f"Unsupported metrics type: {metrics_name}")

retry_count, retry_interval = get_retry_count_and_interval()

for _i in range(retry_count):

api = client.CustomObjectsApi()

try:

node_metrics = api.list_cluster_custom_object("metrics.k8s.io", "v1beta1", "nodes")

for node in node_metrics['items']:

if node_name == node['metadata']['name']:

logging(f"Got node {node_name} metrics {metrics_name} = {node['usage'][metrics_name]}")

return node['usage'][metrics_name]

except ApiException as e:

logging(f"Failed to get node {node_name} metrics {metrics_name}: {e}")

time.sleep(retry_interval)

raise AssertionError(f"Failed to get node {node_name} metrics {metrics_name}")

🧰 Tools

🪛 Ruff (0.8.2)

11-11: Loop control variable i not used within loop body

Rename unused i to _i

(B007)

22-22: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

coderabbitai · 2025-01-21T06:05:39Z

e2e/libs/keywords/metrics_keywords.py

+    def __init__(self):
+        self.node = Node()
+        retry_count, retry_interval = get_retry_count_and_interval()
+


⚠️ Potential issue

Initialize retry_interval in constructor.

The retry_interval value is used in check_if_node_under_memory_pressure but not initialized in the constructor.

def __init__(self): self.node = Node() - retry_count, retry_interval = get_retry_count_and_interval() + self.retry_count, self.retry_interval = get_retry_count_and_interval()

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def __init__(self):

self.node = Node()

retry_count, retry_interval = get_retry_count_and_interval()

def __init__(self):

self.node = Node()

self.retry_count, self.retry_interval = get_retry_count_and_interval()

coderabbitai · 2025-01-21T06:05:40Z

e2e/libs/keywords/metrics_keywords.py

+    def check_if_node_under_memory_pressure(self, node_name):
+        logging(f"Checking if node {node_name} is under memory pressure")
+        condition_status = self.node.get_node_condition(node_name, "MemoryPressure")
+        if condition_status == "True":
+            logging(f"Node {node_name} is under memory pressure")
+            time.sleep(self.retry_count)
+            assert False, f"Node {node_name} is under memory pressure"
+


🛠️ Refactor suggestion

Improve error handling in check_if_node_under_memory_pressure.

Replace assert False with a proper exception and add retry logging.

def check_if_node_under_memory_pressure(self, node_name): logging(f"Checking if node {node_name} is under memory pressure") condition_status = self.node.get_node_condition(node_name, "MemoryPressure") if condition_status == "True": logging(f"Node {node_name} is under memory pressure") - time.sleep(self.retry_count) - assert False, f"Node {node_name} is under memory pressure" + logging(f"Waiting {self.retry_interval}s before failing...") + time.sleep(self.retry_interval) + raise AssertionError(f"Node {node_name} is under memory pressure")

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

48-48: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

coderabbitai · 2025-01-21T06:05:40Z

e2e/libs/keywords/metrics_keywords.py

+    def get_node_total_memory_in_mi(self, node_name):
+        total_memory = self.node.get_node_total_memory(node_name)
+        if "Ki" in total_memory:
+            total_memory = int(total_memory.replace("Ki", ""))
+            total_memory = total_memory / 1024
+        elif "Mi" in total_memory:
+            total_memory = int(total_memory.replace("Mi", ""))
+        logging(f'Got node {node_name} total memory: {total_memory} Mi')
+        return total_memory
+
+    def get_node_memory_usage_in_mi(self, node_name):
+        memory_usage = get_node_metrics(node_name, 'memory')
+        if "Ki" in memory_usage:
+            memory_usage = int(memory_usage.replace("Ki", ""))
+            memory_usage = memory_usage / 1024
+        elif "Mi" in memory_usage:
+            memory_usage = int(memory_usage.replace("Mi", ""))
+        logging(f'Got node {node_name} memory usage: {memory_usage} Mi')
+        return memory_usage
+
+    def get_node_memory_usage_in_percentage(self, node_name):
+        memory_usage_in_mi = self.get_node_memory_usage_in_mi(node_name)
+        total_memory_in_mi = self.get_node_total_memory_in_mi(node_name)
+        memory_usage_in_percentage = memory_usage_in_mi / total_memory_in_mi * 100
+        logging(f'Got node {node_name} memory usage: {memory_usage_in_percentage} %')
+        return memory_usage_in_percentage
+


🛠️ Refactor suggestion

Refactor memory conversion logic to reduce code duplication.

The memory conversion logic is duplicated in both get_node_total_memory_in_mi and get_node_memory_usage_in_mi. Extract this into a helper method.

+ def _convert_to_mi(self, memory_value: str) -> float: + """Convert memory value to MiB. + + Args: + memory_value: Memory value with unit (e.g., "1024Ki", "512Mi") + + Returns: + float: Memory value in MiB + """ + if "Ki" in memory_value: + return int(memory_value.replace("Ki", "")) / 1024 + elif "Mi" in memory_value: + return int(memory_value.replace("Mi", "")) + return float(memory_value) # Assume MiB if no unit def get_node_total_memory_in_mi(self, node_name): total_memory = self.node.get_node_total_memory(node_name) - if "Ki" in total_memory: - total_memory = int(total_memory.replace("Ki", "")) - total_memory = total_memory / 1024 - elif "Mi" in total_memory: - total_memory = int(total_memory.replace("Mi", "")) + total_memory = self._convert_to_mi(total_memory) logging(f'Got node {node_name} total memory: {total_memory} Mi') return total_memory def get_node_memory_usage_in_mi(self, node_name): memory_usage = get_node_metrics(node_name, 'memory') - if "Ki" in memory_usage: - memory_usage = int(memory_usage.replace("Ki", "")) - memory_usage = memory_usage / 1024 - elif "Mi" in memory_usage: - memory_usage = int(memory_usage.replace("Mi", "")) + memory_usage = self._convert_to_mi(memory_usage) logging(f'Got node {node_name} memory usage: {memory_usage} Mi') return memory_usage

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_node_total_memory_in_mi(self, node_name):

total_memory = self.node.get_node_total_memory(node_name)

if "Ki" in total_memory:

total_memory = int(total_memory.replace("Ki", ""))

total_memory = total_memory / 1024

elif "Mi" in total_memory:

total_memory = int(total_memory.replace("Mi", ""))

logging(f'Got node {node_name} total memory: {total_memory} Mi')

return total_memory

def get_node_memory_usage_in_mi(self, node_name):

memory_usage = get_node_metrics(node_name, 'memory')

if "Ki" in memory_usage:

memory_usage = int(memory_usage.replace("Ki", ""))

memory_usage = memory_usage / 1024

elif "Mi" in memory_usage:

memory_usage = int(memory_usage.replace("Mi", ""))

logging(f'Got node {node_name} memory usage: {memory_usage} Mi')

return memory_usage

def get_node_memory_usage_in_percentage(self, node_name):

memory_usage_in_mi = self.get_node_memory_usage_in_mi(node_name)

total_memory_in_mi = self.get_node_total_memory_in_mi(node_name)

memory_usage_in_percentage = memory_usage_in_mi / total_memory_in_mi * 100

logging(f'Got node {node_name} memory usage: {memory_usage_in_percentage} %')

return memory_usage_in_percentage

def _convert_to_mi(self, memory_value: str) -> float:

"""Convert memory value to MiB.

Args:

memory_value: Memory value with unit (e.g., "1024Ki", "512Mi")

Returns:

float: Memory value in MiB

"""

if "Ki" in memory_value:

return int(memory_value.replace("Ki", "")) / 1024

elif "Mi" in memory_value:

return int(memory_value.replace("Mi", ""))

return float(memory_value) # Assume MiB if no unit

def get_node_total_memory_in_mi(self, node_name):

total_memory = self.node.get_node_total_memory(node_name)

total_memory = self._convert_to_mi(total_memory)

logging(f'Got node {node_name} total memory: {total_memory} Mi')

return total_memory

def get_node_memory_usage_in_mi(self, node_name):

memory_usage = get_node_metrics(node_name, 'memory')

memory_usage = self._convert_to_mi(memory_usage)

logging(f'Got node {node_name} memory usage: {memory_usage} Mi')

return memory_usage

def get_node_memory_usage_in_percentage(self, node_name):

memory_usage_in_mi = self.get_node_memory_usage_in_mi(node_name)

total_memory_in_mi = self.get_node_total_memory_in_mi(node_name)

memory_usage_in_percentage = memory_usage_in_mi / total_memory_in_mi * 100

logging(f'Got node {node_name} memory usage: {memory_usage_in_percentage} %')

return memory_usage_in_percentage

coderabbitai · 2025-01-21T06:05:40Z

e2e/libs/node/node.py

+    def get_node_condition(self, node_name, condition_type):
+        node = self.get_node_by_name(node_name)
+        for condition in node.status.conditions:
+            if condition.type == condition_type:
+                logging(f"Got node {node_name} condition {condition_type}: {condition}")
+                return condition.status
+        assert False, f"Failed to get node {node_name} condition {condition_type}: {node}"
+


🛠️ Refactor suggestion

Improve error handling and add documentation.

Replace assert False with proper exception handling as it can be removed by Python's -O flag.

Add docstring and type hints for better code documentation.

- def get_node_condition(self, node_name, condition_type): + def get_node_condition(self, node_name: str, condition_type: str) -> str: + """Get the status of a specific condition type for a node. + + Args: + node_name: Name of the node to check. + condition_type: Type of condition to retrieve (e.g., 'Ready', 'MemoryPressure'). + + Returns: + str: The status of the condition ('True', 'False', or 'Unknown'). + + Raises: + RuntimeError: If the condition type is not found for the node. + """ node = self.get_node_by_name(node_name) for condition in node.status.conditions: if condition.type == condition_type: logging(f"Got node {node_name} condition {condition_type}: {condition}") return condition.status - assert False, f"Failed to get node {node_name} condition {condition_type}: {node}" + raise RuntimeError(f"Failed to get node {node_name} condition {condition_type}: {node}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_node_condition(self, node_name, condition_type):

node = self.get_node_by_name(node_name)

for condition in node.status.conditions:

if condition.type == condition_type:

logging(f"Got node {node_name} condition {condition_type}: {condition}")

return condition.status

assert False, f"Failed to get node {node_name} condition {condition_type}: {node}"

def get_node_condition(self, node_name: str, condition_type: str) -> str:

"""Get the status of a specific condition type for a node.

Args:

node_name: Name of the node to check.

condition_type: Type of condition to retrieve (e.g., 'Ready', 'MemoryPressure').

Returns:

str: The status of the condition ('True', 'False', or 'Unknown').

Raises:

RuntimeError: If the condition type is not found for the node.

"""

node = self.get_node_by_name(node_name)

for condition in node.status.conditions:

if condition.type == condition_type:

logging(f"Got node {node_name} condition {condition_type}: {condition}")

return condition.status

raise RuntimeError(f"Failed to get node {node_name} condition {condition_type}: {node}")

🧰 Tools

🪛 Ruff (0.8.2)

133-133: Do not assert False (python -O removes these calls), raise AssertionError()

Replace assert False

(B011)

test(robot): add test case Check If Nodes Are Under Memory Pressure A…

22c9563

…fter Cluster Restart Signed-off-by: Yang Chiu <[email protected]>

yangchiu self-assigned this Jan 21, 2025

yangchiu requested a review from a team as a code owner January 21, 2025 05:55

coderabbitai bot reviewed Jan 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(robot): add test case Check If Nodes Are Under Memory Pressure After Cluster Restart #2285

test(robot): add test case Check If Nodes Are Under Memory Pressure After Cluster Restart #2285

yangchiu commented Jan 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 21, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Jan 21, 2025

coderabbitai bot Jan 21, 2025

coderabbitai bot Jan 21, 2025

coderabbitai bot Jan 21, 2025

coderabbitai bot Jan 21, 2025

test(robot): add test case Check If Nodes Are Under Memory Pressure After Cluster Restart #2285

Are you sure you want to change the base?

test(robot): add test case Check If Nodes Are Under Memory Pressure After Cluster Restart #2285

Conversation

yangchiu commented Jan 21, 2025 • edited by coderabbitai bot Loading

Which issue(s) this PR fixes:

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Summary by CodeRabbit

coderabbitai bot commented Jan 21, 2025 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

coderabbitai bot Jan 21, 2025

Choose a reason for hiding this comment

yangchiu commented Jan 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 21, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)