Skip to content

Commit ed43656

Browse files
author
Michael Burke
committed
[enterprise-4.16] OSDOCS 16930 CQA2.0 of NODES-2: Node Management and Maintenance Part II
1 parent 8e462f2 commit ed43656

17 files changed

+117
-90
lines changed

modules/mco-update-boot-images-configuring.adoc

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,16 @@
77
[id="mco-update-boot-images-configuring_{context}"]
88
= Configuring updated boot images
99

10-
By default, {product-title} does not manage the boot image. You can configure your cluster to update the boot image whenever you update your cluster by modifying the `MachineConfiguration` object.
10+
[role="_abstract"]
11+
include::snippets/mco-update-boot-images-abstract.adoc[]
12+
13+
Enabling the feature updates the boot image to the {op-system-first} boot image version appropriate for your cluster. If the cluster is again updated to a new {product-title} version in the future, the boot image is updated again. New nodes created after enabling the feature use the updated boot image. This feature has no effect on existing nodes.
14+
15+
To enable the boot image management feature for control plane machine sets or to re-enable the boot image management feature for worker machine sets where it was disabled, edit the `MachineConfiguration` object. You can enable the feature for all of the machine sets in the cluster or specific machine sets.
1116

1217
.Prerequisites
1318

14-
* You have enabled the `TechPreviewNoUpgrade` feature set by using the feature gates. For more information, see "Enabling features using feature gates" in the _Additional resources_ section.
19+
* If you are enabling boot image management for control plane machine sets, you enabled the required Technology Preview features for your cluster by editing the `FeatureGate` CR named `cluster`.
1520
1621
.Procedure
1722

@@ -87,7 +92,6 @@ $ oc get machineconfiguration cluster -n openshift-machine-api -o yaml
8792
----
8893
+
8994
.Example machine set with the boot image reference
90-
+
9195
[source,yaml]
9296
----
9397
kind: MachineConfiguration

modules/mco-update-boot-images-disable.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@
77
[id="mco-update-boot-images-disable_{context}"]
88
= Disabling updated boot images
99

10-
To disable the updated boot image feature, edit the `MachineConfiguration` object so that the `machineManagers` field is an empty array.
10+
[role="_abstract"]
11+
You can disable the boot image management feature so that the Machine Config Operator (MCO) no longer manages or updates the boot image in the affected machine sets. For example, you could disable this feature for the worker nodes in order to use a custom boot image that you do not want changed.
1112

1213
If you disable this feature after some nodes have been created with the new boot image version, any existing nodes retain their current boot image. Turning off this feature does not rollback the nodes or machine sets to the originally-installed boot image. The machine sets retain the boot image version that was present when the feature was enabled and is not updated again when the cluster is upgraded to a new {product-title} version in the future.
1314

modules/nodes-nodes-viewing-listing-pods.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
[id="nodes-nodes-viewing-listing-pods_{context}"]
77
= Listing pods on a node in your cluster
88

9-
You can list all the pods on a specific node.
9+
[role="_abstract"]
10+
You can list all of the pods on a node by using the `oc get pods` command along with specific flags. This command shows the number of pods on that node, the state of the pods, number of pod restarts, and the age of the pods.
1011

1112
.Procedure
1213

modules/nodes-nodes-viewing-listing.adoc

Lines changed: 30 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
[id="nodes-nodes-viewing-listing_{context}"]
77
= About listing all the nodes in a cluster
88

9-
You can get detailed information on the nodes in the cluster.
9+
[role="_abstract"]
10+
You can get detailed information about the nodes in the cluster, which can help you understand the state of the nodes in your cluster.
1011

1112
* The following command lists all nodes:
1213
+
@@ -108,39 +109,39 @@ include::snippets/osd-aws-example-only.adoc[]
108109
.Example output
109110
[source,text]
110111
----
111-
Name: node1.example.com <1>
112-
Roles: worker <2>
112+
Name: node1.example.com
113+
Roles: worker
113114
Labels: kubernetes.io/os=linux
114115
kubernetes.io/hostname=ip-10-0-131-14
115-
kubernetes.io/arch=amd64 <3>
116+
kubernetes.io/arch=amd64
116117
node-role.kubernetes.io/worker=
117118
node.kubernetes.io/instance-type=m4.large
118119
node.openshift.io/os_id=rhcos
119120
node.openshift.io/os_version=4.5
120121
region=east
121122
topology.kubernetes.io/region=us-east-1
122123
topology.kubernetes.io/zone=us-east-1a
123-
Annotations: cluster.k8s.io/machine: openshift-machine-api/ahardin-worker-us-east-2a-q5dzc <4>
124+
Annotations: cluster.k8s.io/machine: openshift-machine-api/ahardin-worker-us-east-2a-q5dzc
124125
machineconfiguration.openshift.io/currentConfig: worker-309c228e8b3a92e2235edd544c62fea8
125126
machineconfiguration.openshift.io/desiredConfig: worker-309c228e8b3a92e2235edd544c62fea8
126127
machineconfiguration.openshift.io/state: Done
127128
volumes.kubernetes.io/controller-managed-attach-detach: true
128129
CreationTimestamp: Wed, 13 Feb 2019 11:05:57 -0500
129-
Taints: <none> <5>
130+
Taints: <none>
130131
Unschedulable: false
131-
Conditions: <6>
132+
Conditions:
132133
Type Status LastHeartbeatTime LastTransitionTime Reason Message
133134
---- ------ ----------------- ------------------ ------ -------
134135
OutOfDisk False Wed, 13 Feb 2019 15:09:42 -0500 Wed, 13 Feb 2019 11:05:57 -0500 KubeletHasSufficientDisk kubelet has sufficient disk space available
135136
MemoryPressure False Wed, 13 Feb 2019 15:09:42 -0500 Wed, 13 Feb 2019 11:05:57 -0500 KubeletHasSufficientMemory kubelet has sufficient memory available
136137
DiskPressure False Wed, 13 Feb 2019 15:09:42 -0500 Wed, 13 Feb 2019 11:05:57 -0500 KubeletHasNoDiskPressure kubelet has no disk pressure
137138
PIDPressure False Wed, 13 Feb 2019 15:09:42 -0500 Wed, 13 Feb 2019 11:05:57 -0500 KubeletHasSufficientPID kubelet has sufficient PID available
138139
Ready True Wed, 13 Feb 2019 15:09:42 -0500 Wed, 13 Feb 2019 11:07:09 -0500 KubeletReady kubelet is posting ready status
139-
Addresses: <7>
140+
Addresses:
140141
InternalIP: 10.0.140.16
141142
InternalDNS: ip-10-0-140-16.us-east-2.compute.internal
142143
Hostname: ip-10-0-140-16.us-east-2.compute.internal
143-
Capacity: <8>
144+
Capacity:
144145
attachable-volumes-aws-ebs: 39
145146
cpu: 2
146147
hugepages-1Gi: 0
@@ -154,7 +155,7 @@ Allocatable:
154155
hugepages-2Mi: 0
155156
memory: 7558116Ki
156157
pods: 250
157-
System Info: <9>
158+
System Info:
158159
Machine ID: 63787c9534c24fde9a0cde35c13f1f66
159160
System UUID: EC22BF97-A006-4A58-6AF8-0A38DEEA122A
160161
Boot ID: f24ad37d-2594-46b4-8830-7f7555918325
@@ -167,7 +168,7 @@ System Info: <9>
167168
Kube-Proxy Version: v1.29.4
168169
PodCIDR: 10.128.4.0/24
169170
ProviderID: aws:///us-east-2a/i-04e87b31dc6b3e171
170-
Non-terminated Pods: (12 in total) <10>
171+
Non-terminated Pods: (12 in total)
171172
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
172173
--------- ---- ------------ ---------- --------------- -------------
173174
openshift-cluster-node-tuning-operator tuned-hdl5q 0 (0%) 0 (0%) 0 (0%) 0 (0%)
@@ -189,7 +190,7 @@ Allocated resources:
189190
cpu 380m (25%) 270m (18%)
190191
memory 880Mi (11%) 250Mi (3%)
191192
attachable-volumes-aws-ebs 0 0
192-
Events: <11>
193+
Events:
193194
Type Reason Age From Message
194195
---- ------ ---- ---- -------
195196
Normal NodeHasSufficientPID 6d (x5 over 6d) kubelet, m01.example.com Node m01.example.com status is now: NodeHasSufficientPID
@@ -201,25 +202,27 @@ Events: <11>
201202
Normal Starting 6d kubelet, m01.example.com Starting kubelet.
202203
#...
203204
----
204-
<1> The name of the node.
205-
<2> The role of the node, either `master` or `worker`.
206-
<3> The labels applied to the node.
207-
<4> The annotations applied to the node.
208-
<5> The taints applied to the node.
209-
<6> The node conditions and status. The `conditions` stanza lists the `Ready`, `PIDPressure`, `MemoryPressure`, `DiskPressure` and `OutOfDisk` status. These condition are described later in this section.
210-
<7> The IP address and hostname of the node.
211-
<8> The pod resources and allocatable resources.
212-
<9> Information about the node host.
213-
<10> The pods on the node.
214-
<11> The events reported by the node.
215-
216-
ifndef::openshift-rosa,openshift-dedicated[]
217-
205+
where:
206+
+
207+
--
208+
`Names`:: Specifies the name of the node.
209+
`Roles`:: Specifies the role of the node, either `master` or `worker`.
210+
`Labels`:: Specifies the labels applied to the node.
211+
`Annotations`:: Specifies the annotations applied to the node.
212+
`Taints`:: Specifies the taints applied to the node.
213+
`Conditions`:: Specifies the node conditions and status. The `conditions` stanza lists the `Ready`, `PIDPressure`, `MemoryPressure`, `DiskPressure` and `OutOfDisk` status. These condition are described later in this section.
214+
`Addresses`:: Specifies the IP address and hostname of the node.
215+
`Capacity`:: Specifies the pod resources and allocatable resources.
216+
`Information`:: Specifies information about the node host.
217+
`Non-terminated Pods`:: Specifies the pods on the node.
218+
`Events`:: Specifies the events reported by the node.
219+
--
220+
+
221+
ifndef::openshift-rosa,openshift-rosa-hcp,openshift-dedicated[]
218222
[NOTE]
219223
====
220224
The control plane label is not automatically added to newly created or updated master nodes. If you want to use the control plane label for your nodes, you can manually configure the label. For more information, see _Understanding how to update labels on nodes_ in the _Additional resources_ section.
221225
====
222-
223226
endif::openshift-rosa,openshift-dedicated[]
224227
225228
Among the information shown for nodes, the following node conditions appear in the output of the commands shown in this section:

modules/nodes-nodes-viewing-memory.adoc

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@
66
[id="nodes-nodes-viewing-memory_{context}"]
77
= Viewing memory and CPU usage statistics on your nodes
88

9-
You can display usage statistics about nodes, which provide the runtime
10-
environments for containers. These usage statistics include CPU, memory, and
11-
storage consumption.
9+
[role="_abstract"]
10+
You can display usage statistics about nodes, including CPU, memory, and storage consumption. These statistics can help you ensure your cluster is running efficiently.
1211

1312
.Prerequisites
1413

modules/nodes-nodes-working-deleting-bare-metal.adoc

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,18 @@
77
[id="nodes-nodes-working-deleting-bare-metal_{context}"]
88
= Deleting nodes from a bare metal cluster
99

10+
[role="_abstract"]
11+
You can delete a node from a {product-title} cluster that does not use machine sets by using the `oc delete node` command and decommissioning the node.
12+
1013
When you delete a node using the CLI, the node object is deleted in Kubernetes,
1114
but the pods that exist on the node are not deleted. Any bare pods not backed by
1215
a replication controller become inaccessible to {product-title}. Pods backed by
1316
replication controllers are rescheduled to other available nodes. You must
1417
delete local manifest pods.
1518

16-
.Procedure
19+
The following procedure deletes a node from an {product-title} cluster running on bare metal.
1720

18-
Delete a node from an {product-title} cluster running on bare metal by completing
19-
the following steps:
21+
.Procedure
2022

2123
. Mark the node as unschedulable:
2224
+
@@ -32,7 +34,7 @@ $ oc adm cordon <node_name>
3234
$ oc adm drain <node_name> --force=true
3335
----
3436
+
35-
This step might fail if the node is offline or unresponsive. Even if the node does not respond, it might still be running a workload that writes to shared storage. To avoid data corruption, power down the physical hardware before you proceed.
37+
This step might fail if the node is offline or unresponsive. Even if the node does not respond, the node might still be running a workload that writes to shared storage. To avoid data corruption, power down the physical hardware before you proceed.
3638

3739
. Delete the node from the cluster:
3840
+

modules/nodes-nodes-working-deleting.adoc

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
[id="nodes-nodes-working-deleting_{context}"]
77
= Deleting nodes from a cluster
88

9-
To delete a node from the {product-title} cluster, scale down the appropriate `MachineSet` object.
9+
[role="_abstract"]
10+
You can delete a node from a {product-title} cluster by scaling down the appropriate `MachineSet` object.
1011

1112
[IMPORTANT]
1213
====
@@ -58,7 +59,9 @@ metadata:
5859
namespace: openshift-machine-api
5960
# ...
6061
spec:
61-
replicas: 2 # <1>
62+
replicas: 2
6263
# ...
6364
----
64-
<1> Specify the number of replicas to scale down to.
65+
where:
66+
67+
`spec.replicas`:: Specifies the number of replicas to scale down to.

modules/nodes-nodes-working-evacuating.adoc

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,23 @@
44

55
:_mod-docs-content-type: PROCEDURE
66
[id="nodes-nodes-working-evacuating_{context}"]
7-
= Understanding how to evacuate pods on nodes
7+
= Evacuating pods on nodes
88

9-
Evacuating pods allows you to migrate all or selected pods from a given node or
10-
nodes.
9+
[role="_abstract"]
10+
You can remove, or evacuate, pods from a given node or nodes. Evacuating pods allows you to migrate all or selected pods to other nodes.
1111

12-
You can only evacuate pods backed by a replication controller. The replication controller creates new pods on
12+
You can evacuate only pods that are backed by a replication controller. The replication controller creates new pods on
1313
other nodes and removes the existing pods from the specified node(s).
1414

1515
Bare pods, meaning those not backed by a replication controller, are unaffected by default.
16-
You can evacuate a subset of pods by specifying a pod-selector. Pod selectors are
17-
based on labels, so all the pods with the specified label will be evacuated.
16+
You can evacuate a subset of pods by specifying a pod selector. Because pod selectors are
17+
based on labels, all of the pods with the specified label are evacuated.
1818

1919
.Procedure
2020

21-
. Mark the nodes unschedulable before performing the pod evacuation.
21+
. Mark the nodes as unschedulable before performing the pod evacuation.
2222

23-
.. Mark the node as unschedulable:
23+
.. Mark the node as unschedulable by running the following command:
2424
+
2525
[source,terminal]
2626
----
@@ -33,7 +33,7 @@ $ oc adm cordon <node1>
3333
node/<node1> cordoned
3434
----
3535

36-
.. Check that the node status is `Ready,SchedulingDisabled`:
36+
.. Check that the node status is `Ready,SchedulingDisabled` by running the following command:
3737
+
3838
[source,terminal]
3939
----
@@ -47,69 +47,69 @@ NAME STATUS ROLES AGE VERSION
4747
<node1> Ready,SchedulingDisabled worker 1d v1.29.4
4848
----
4949

50-
. Evacuate the pods using one of the following methods:
50+
. Evacuate the pods by using one of the following methods:
5151

52-
** Evacuate all or selected pods on one or more nodes:
52+
** Evacuate all or selected pods on one or more nodes by running the `oc adm drain` command:
5353
+
5454
[source,terminal]
5555
----
5656
$ oc adm drain <node1> <node2> [--pod-selector=<pod_selector>]
5757
----
5858

59-
** Force the deletion of bare pods using the `--force` option. When set to
59+
** Force the deletion of bare pods by using the `--force` option with the `oc adm drain` command. When set to
6060
`true`, deletion continues even if there are pods not managed by a replication
61-
controller, replica set, job, daemon set, or stateful set:
61+
controller, replica set, job, daemon set, or stateful set.
6262
+
6363
[source,terminal]
6464
----
6565
$ oc adm drain <node1> <node2> --force=true
6666
----
6767

68-
** Set a period of time in seconds for each pod to
69-
terminate gracefully, use `--grace-period`. If negative, the default value specified in the pod will
68+
** Set a period of time in seconds for each pod to
69+
terminate gracefully by using the `--grace-period` option with the `oc adm drain` command. If negative, the default value specified in the pod will
7070
be used:
7171
+
7272
[source,terminal]
7373
----
7474
$ oc adm drain <node1> <node2> --grace-period=-1
7575
----
7676

77-
** Ignore pods managed by daemon sets using the `--ignore-daemonsets` flag set to `true`:
77+
** Ignore pods managed by daemon sets by using the `--ignore-daemonsets=true` option with the `oc adm drain` command:
7878
+
7979
[source,terminal]
8080
----
8181
$ oc adm drain <node1> <node2> --ignore-daemonsets=true
8282
----
8383

84-
** Set the length of time to wait before giving up using the `--timeout` flag. A
85-
value of `0` sets an infinite length of time:
84+
** Set the length of time to wait before giving up using the `--timeout` option with the `oc adm drain` command. A
85+
value of `0` sets an infinite length of time.
8686
+
8787
[source,terminal]
8888
----
8989
$ oc adm drain <node1> <node2> --timeout=5s
9090
----
9191

92-
** Delete pods even if there are pods using `emptyDir` volumes by setting the `--delete-emptydir-data` flag to `true`. Local data is deleted when the node
93-
is drained:
92+
** Delete pods even if there are pods using `emptyDir` volumes by setting the `--delete-emptydir-data=true` option with the `oc adm drain` command. Local data is deleted when the node
93+
is drained.
9494
+
9595
[source,terminal]
9696
----
9797
$ oc adm drain <node1> <node2> --delete-emptydir-data=true
9898
----
9999

100-
** List objects that will be migrated without actually performing the evacuation,
101-
using the `--dry-run` option set to `true`:
100+
** List objects that would be migrated without actually performing the evacuation,
101+
by using the `--dry-run=true` option with the `oc adm drain` command:
102102
+
103103
[source,terminal]
104104
----
105105
$ oc adm drain <node1> <node2> --dry-run=true
106106
----
107107
+
108108
Instead of specifying specific node names (for example, `<node1> <node2>`), you
109-
can use the `--selector=<node_selector>` option to evacuate pods on selected
109+
can use the `--selector=<node_selector>` option with the `oc adm drain` command to evacuate pods on selected
110110
nodes.
111111

112-
. Mark the node as schedulable when done.
112+
. Mark the node as schedulable when done by using the following command.
113113
+
114114
[source,terminal]
115115
----

modules/nodes-nodes-working-marking.adoc

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,14 @@
66
[id="nodes-nodes-working-marking_{context}"]
77
= Understanding how to mark nodes as unschedulable or schedulable
88

9+
[role="_abstract"]
10+
You can mark a node as unschedulable in order to block any new pods from being scheduled on the node.
11+
12+
When you mark a node as unschedulable, existing pods on the node are not affected.
13+
914
By default, healthy nodes with a `Ready` status are
1015
marked as schedulable, which means that you can place new pods on the
11-
node. Manually marking a node as unschedulable blocks any new pods from being
12-
scheduled on the node. Existing pods on the node are not affected.
16+
node.
1317

1418
* The following command marks a node or nodes as unschedulable:
1519
+
@@ -42,5 +46,5 @@ node1.example.com kubernetes.io/hostname=node1.example.com Ready,Schedul
4246
$ oc adm uncordon <node1>
4347
----
4448
+
45-
Alternatively, instead of specifying specific node names (for example, `<node>`), you can use the `--selector=<node_selector>` option to mark selected
49+
Instead of specifying specific node names (for example, `<node>`), you can use the `--selector=<node_selector>` option to mark selected
4650
nodes as schedulable or unschedulable.

0 commit comments

Comments
 (0)