-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add preflight OS, CPU, RAM, Swap, and Filesystem checks #326
base: devel
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, avoid as much as possible using ignore_errors: true
5d002a4
to
280a9cf
Compare
280a9cf
to
2a6cb0f
Compare
2a6cb0f
to
e837ef9
Compare
e837ef9
to
6e47331
Compare
6e47331
to
9546e44
Compare
- Implemented OS preflight checks to validate system requirements before Ceph cluster creation. - Checks include: - OS version (RHEL 9+ required) - SELinux enforcing mode - Firewalld installation and status - Required package availability (rpcbind, podman, firewalld) - Podman version check (>= 3.3) - RHEL software profile validation - Tuned profile check - CPU, RAM, Swap, and Filesystem (part of other checks) - Check whether jumbo frames are enabled - Is it configured with DHCP or static IP - Is the bandwidth sufficient - Collect and output current NIC options set (e.g. Bonding, not bridged or virtual) - Check and report network latency (ping) with all hosts provided in the inventory file - Separate NICs for front-end and back-end networks
9546e44
to
39a250e
Compare
jenkins test el9-functional |
2 similar comments
jenkins test el9-functional |
jenkins test el9-functional |
@@ -0,0 +1,314 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, make it an actual playbook:
--- | |
--- | |
- hosts: all | |
become: true | |
gather_facts: true | |
tasks: | |
- name: Initialize preflight results list | |
set_fact: | |
preflight_results: [] | |
preflight_failures: [] |
- name: Ensure required packages are installed | ||
package: | ||
name: "{{ infra_pkgs }}" | ||
state: present | ||
register: package_install | ||
failed_when: false | ||
|
||
- name: Determine Package Installation Check Result | ||
set_fact: | ||
package_check: "{{ 'PASS' if not package_install.failed else 'FAIL' }}" | ||
|
||
- name: Determine Package Installation Failure Reason | ||
set_fact: | ||
package_reason: "{{ 'Some required packages failed to install' if package_check == 'FAIL' else 'N/A' }}" | ||
|
||
- name: Store Package Installation Result | ||
set_fact: | ||
preflight_results: "{{ preflight_results + [{'Check': 'Required Packages Installed', 'Result': package_check, 'Reason': package_reason}] }}" | ||
preflight_failures: "{{ preflight_failures + ['Required Packages'] if package_check == 'FAIL' else preflight_failures }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cephadm-preflight
playbook already handles package installation.
- name: Ensure required packages are installed | |
package: | |
name: "{{ infra_pkgs }}" | |
state: present | |
register: package_install | |
failed_when: false | |
- name: Determine Package Installation Check Result | |
set_fact: | |
package_check: "{{ 'PASS' if not package_install.failed else 'FAIL' }}" | |
- name: Determine Package Installation Failure Reason | |
set_fact: | |
package_reason: "{{ 'Some required packages failed to install' if package_check == 'FAIL' else 'N/A' }}" | |
- name: Store Package Installation Result | |
set_fact: | |
preflight_results: "{{ preflight_results + [{'Check': 'Required Packages Installed', 'Result': package_check, 'Reason': package_reason}] }}" | |
preflight_failures: "{{ preflight_failures + ['Required Packages'] if package_check == 'FAIL' else preflight_failures }}" |
- name: Ensure firewalld is enabled and running | ||
systemd: | ||
name: firewalld | ||
state: started | ||
enabled: true | ||
register: firewall_status | ||
failed_when: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just move this task to cephadm-preflight.yml
- name: Ensure firewalld is enabled and running | |
systemd: | |
name: firewalld | |
state: started | |
enabled: true | |
register: firewall_status | |
failed_when: false |
- name: Ensure Podman is installed if missing (Fixable) | ||
package: | ||
name: podman | ||
state: present | ||
when: not podman_installed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cephadm-preflight.yml
already handles podman installation
- name: Ensure Podman is installed if missing (Fixable) | |
package: | |
name: podman | |
state: present | |
when: not podman_installed |
- name: Extract NIC Details | ||
set_fact: | ||
nic_config_details: "{{ ansible_facts['interfaces'] }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This task is unnecessary, just use ansible_facts['interfaces']
directly wherever you need it
- name: Extract NIC Details | |
set_fact: | |
nic_config_details: "{{ ansible_facts['interfaces'] }}" |
- name: Identify Front-End and Back-End NICs | ||
set_fact: | ||
frontend_nic: "{{ ansible_facts['default_ipv4']['interface'] | default('Unknown') }}" | ||
backend_nic: "{{ ansible_facts['interfaces'] | difference(['lo', ansible_facts['default_ipv4']['interface']]) | first | default(ansible_facts['default_ipv4']['interface']) }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This task feels a bit awkward; that's not something Ansible can predict for you
- name: Identify Front-End and Back-End NICs | |
set_fact: | |
frontend_nic: "{{ ansible_facts['default_ipv4']['interface'] | default('Unknown') }}" | |
backend_nic: "{{ ansible_facts['interfaces'] | difference(['lo', ansible_facts['default_ipv4']['interface']]) | first | default(ansible_facts['default_ipv4']['interface']) }}" |
I'd simply list all available network interfaces along with their link speed 🤷♂️
- name: Read Preflight Check Report | ||
slurp: | ||
src: ./ceph_preflight_report.txt | ||
register: report_content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this task really needed? By the way, it's probably missing a delegate_to: localhost
so I suspect it would fail.
please, have a look at loopkup('template')
instead
@@ -0,0 +1,314 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd keep consistency and rename that file : preflight-checks.yml
( _
-> -
)
- name: Gather all Ansible facts | ||
setup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can get rid of the following if you make it an actual playbook with gather_facts: true
- name: Gather all Ansible facts | |
setup: |
firewalld_check: "{{ 'PASS' if firewall_status.status.ActiveState == 'active' else 'FAIL' }}" | ||
firewalld_reason: "{{ 'Firewalld was not running and could not be started' if firewall_status.failed else 'N/A' }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate?
firewalld_check: "{{ 'PASS' if firewall_status.status.ActiveState == 'active' else 'FAIL' }}" | |
firewalld_reason: "{{ 'Firewalld was not running and could not be started' if firewall_status.failed else 'N/A' }}" |
selinux_check: "{{ 'PASS' if ansible_facts['selinux']['status'] == 'enabled' and ansible_facts['selinux']['mode'] == 'enforcing' else 'FAIL' }}" | ||
selinux_reason: "{{ 'SELinux was not in enforcing mode and could not be enforced automatically' if selinux_check == 'FAIL' else 'N/A' }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate?
selinux_check: "{{ 'PASS' if ansible_facts['selinux']['status'] == 'enabled' and ansible_facts['selinux']['mode'] == 'enforcing' else 'FAIL' }}" | |
selinux_reason: "{{ 'SELinux was not in enforcing mode and could not be enforced automatically' if selinux_check == 'FAIL' else 'N/A' }}" |
- name: Collect installed package facts | ||
package_facts: | ||
manager: auto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this task to the beginning of the playbook
- name: Retrieve SELinux status from ansible_facts | ||
setup: | ||
gather_subset: | ||
- selinux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: Retrieve SELinux status from ansible_facts | |
setup: | |
gather_subset: | |
- selinux |
- name: Ping all hosts in inventory | ||
ansible.builtin.ping: | ||
register: ping_results | ||
failed_when: false | ||
delegate_to: "{{ item }}" | ||
with_items: "{{ groups['all'] }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't give you the latency like an actual ICMP ping would.
This is just an Ansible module called 'ping', used to check if Ansible is able to connect to nodes.
- name: Run checks | ||
import_tasks: preflight_checks.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- name: Run checks | |
import_tasks: preflight_checks.yml | |
- name: variables validations | |
ansible.builtin.import_playbook: validate/preflight.yml |
setup: | ||
|
||
- name: Check if OS is RHEL 9+ | ||
set_fact: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please, use FQCN everywhere:
set_fact: | |
ansible.builtin.set_fact: |
@Kushal-deb Please consider using a single - name: Store all check results
set_fact:
preflight_results: >-
{{ preflight_results + [
{'Check': 'OS Version', 'Result': os_check, 'Reason': os_reason},
{'Check': 'Tuned Profile', 'Result': tuned_profile_check, 'Reason': tuned_profile_reason},
{'Check': 'RHEL Profile', 'Result': rhel_profile_check, 'Reason': rhel_profile_reason},
{'Check': 'Firewalld Running', 'Result': firewalld_check, 'Reason': firewalld_reason},
{'Check': 'Podman Installed', 'Result': podman_check, 'Reason': podman_reason},
{'Check': 'SELinux', 'Result': selinux_check, 'Reason': selinux_reason},
{'Check': 'Minimum RAM (8GB)', 'Result': memory_checks['ram']['result'], 'Reason': memory_checks['ram']['reason']},
{'Check': 'Swap Space (1.5x RAM)', 'Result': memory_checks['swap']['result'], 'Reason': memory_checks['swap']['reason']},
{'Check': 'CPU x86-64-v2', 'Result': cpu_checks['x86_64_v2']['result'], 'Reason': cpu_checks['x86_64_v2']['reason']},
{'Check': 'CPU Cores >= 4', 'Result': cpu_checks['cores']['result'], 'Reason': cpu_checks['cores']['reason']},
{'Check': '/var is a separate partition', 'Result': filesystem_checks['var_partition']['result'], 'Reason': filesystem_checks['var_partition']['reason']},
{'Check': 'Root Filesystem >= 100GB', 'Result': filesystem_checks['root_fs']['result'], 'Reason': filesystem_checks['root_fs']['reason']},
{'Check': 'SELinux', 'Result': selinux_check, 'Reason': selinux_reason},
{'Check': 'Jumbo Frames Enabled', 'Result': jumbo_frames_check, 'Reason': jumbo_frames_reason},
{'Check': 'Network Latency', 'Result': 'INFO', 'Reason': 'Latency results: ' ~ ping_results.results | map(attribute='ping') | list},
{'Check': 'NIC Static IP Configuration', 'Result': nic_config_check, 'Reason': nic_config_reason},
{'Check': 'NIC Bandwidth (10GbE Recommended)', 'Result': nic_speed_check, 'Reason': nic_speed_reason},
] }}
preflight_failures: >-
{{ preflight_failures
+ (['OS Version'] if os_check == 'FAIL' else [])
+ (['Tuned Profile'] if tuned_profile_check == 'FAIL' else [])
+ (['RHEL Profile'] if rhel_profile_check == 'FAIL' else [])
+ (['SELinux'] if selinux_check == 'FAIL' else [])
+ (['Firewalld Running'] if firewalld_check == 'FAIL' else [])
+ (['Podman Installed'] if not podman_installed else [])
+ (['Minimum RAM'] if memory_checks['ram']['result'] == 'FAIL' else [])
+ (['Swap Space'] if memory_checks['swap']['result'] == 'FAIL' else [])
+ (['CPU x86-64-v2'] if cpu_checks['x86_64_v2']['result'] == 'FAIL' else [])
+ (['CPU Cores'] if cpu_checks['cores']['result'] == 'FAIL' else [])
+ (['/var Partition'] if filesystem_checks['var_partition']['result'] == 'FAIL' else [])
+ (['Root Filesystem'] if filesystem_checks['root_fs']['result'] == 'FAIL' else [])
+ (['SELinux'] if selinux_check == 'FAIL' else []) }} |
Implemented OS, NIC and Other preflight checks to validate system requirements before Ceph cluster creation.
Enhancements:
https://tracker.ceph.com/issues/69726
https://tracker.ceph.com/issues/69781
https://tracker.ceph.com/issues/69843
Logs: