Skip to content

Commit a84ef5d

Browse files
committed
[FLINK-33392][docs] Add the documentation pages for balanced tasks scheduling.
1 parent 1e36750 commit a84ef5d

File tree

7 files changed

+1520
-0
lines changed

7 files changed

+1520
-0
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Tasks Scheduling
3+
bookCollapseSection: true
4+
weight: 9
5+
---
6+
<!--
7+
Licensed to the Apache Software Foundation (ASF) under one
8+
or more contributor license agreements. See the NOTICE file
9+
distributed with this work for additional information
10+
regarding copyright ownership. The ASF licenses this file
11+
to you under the Apache License, Version 2.0 (the
12+
"License"); you may not use this file except in compliance
13+
with the License. You may obtain a copy of the License at
14+
15+
http://www.apache.org/licenses/LICENSE-2.0
16+
17+
Unless required by applicable law or agreed to in writing,
18+
software distributed under the License is distributed on an
19+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
20+
KIND, either express or implied. See the License for the
21+
specific language governing permissions and limitations
22+
under the License.
23+
-->
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: Balanced Tasks Scheduling
3+
weight: 5
4+
type: docs
5+
6+
---
7+
<!--
8+
Licensed to the Apache Software Foundation (ASF) under one
9+
or more contributor license agreements. See the NOTICE file
10+
distributed with this work for additional information
11+
regarding copyright ownership. The ASF licenses this file
12+
to you under the Apache License, Version 2.0 (the
13+
"License"); you may not use this file except in compliance
14+
with the License. You may obtain a copy of the License at
15+
16+
http://www.apache.org/licenses/LICENSE-2.0
17+
18+
Unless required by applicable law or agreed to in writing,
19+
software distributed under the License is distributed on an
20+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
21+
KIND, either express or implied. See the License for the
22+
specific language governing permissions and limitations
23+
under the License.
24+
-->
25+
26+
# Balanced Tasks Scheduling
27+
28+
This page describes the background and principle of balanced tasks scheduling,
29+
how to use it when running streaming jobs.
30+
31+
## Background
32+
33+
When the parallelism of all vertices within a Flink streaming job is inconsistent,
34+
the [default strategy]({{< ref "docs/deployment/config" >}}#taskmanager-load-balance-mode)
35+
of Flink to deploy tasks sometimes leads some `TaskManagers` have more tasks while others have fewer tasks,
36+
resulting in excessive resource utilization at some `TaskManagers`
37+
that contain more tasks and becoming a bottleneck for the entire job processing.
38+
39+
{{< img src="/fig/deployments/tasks-scheduling/tasks_scheduling_skew_case.svg" alt="The Skew Case of Tasks Scheduling" class="offset" width="50%" >}}
40+
41+
As shown in figure (a), given a Flink job comprising two vertices, `JobVertex-A (JV-A)` and `JobVertex-B (JV-B)`,
42+
with parallelism degrees of `6` and `3` respectively,
43+
and both vertices sharing the same slot sharing group.
44+
Under the default tasks scheduling strategy, as illustrated in figure (b),
45+
the distribution of tasks across `TaskManagers` may result in significant disparities in task load.
46+
Specifically, the `TaskManager`s with the highest number of tasks may host `4` tasks,
47+
while the one with the lowest load may have only `2` tasks.
48+
Consequently, the `TaskManager`s bearing 4 tasks is prone to become a performance bottleneck for the entire job.
49+
50+
Therefore, Flink provides the task-quantity-based balanced tasks scheduling capability.
51+
Within the job's resource view, it aims to ensure that the number of tasks
52+
scheduled to each `TaskManager` as close as possible to,
53+
thereby improving the resource usage skew among `TaskManagers`.
54+
55+
<span class="label label-info">Note</span> The presence of inconsistent parallelism does not imply that this strategy must be used, as this is not always the case in practice.
56+
57+
## Principle
58+
59+
The task-quantity-based load balancing tasks scheduling strategy completes the assignment of tasks to `TaskManagers` in two phases:
60+
- The tasks-to-slots assignment phase
61+
- The slots-to-TaskManagers assignment phase
62+
63+
This section will use two examples to illustrate the simplified process and principle of
64+
how the task-quantity-based tasks scheduling strategy handles the assignments in these two phases.
65+
66+
### The tasks-to-slots assignment phase
67+
68+
Taking the job shown in figure (c) as an example, it contains five job vertices with parallelism degrees of `1`, `4`, `4`, `2`, and `3`, respectively.
69+
All five job vertices belong to the default slot sharing group.
70+
71+
{{< img src="/fig/deployments/tasks-scheduling/tasks_to_slots_allocation_principle.svg" alt="The Tasks To Slots Allocation Principle Demo" class="offset" width="65%" >}}
72+
73+
During the tasks-to-slots assignment phase, this tasks scheduling strategy:
74+
- First directly assigns the tasks of the vertices with the highest parallelism to the `i-th` slot.
75+
76+
That is, task `JV-Bi` is assigned directly to `sloti`, and task `JV-Ci` is assigned directly to `sloti`.
77+
78+
- Next, for tasks belonging to job vertices with sub-maximal parallelism, they are assigned in a round-robin fashion across the slots within the current
79+
slot sharing group until all tasks are allocated.
80+
81+
As shown in figure (e), under the task-quantity-based assignment strategy, the range (max-min difference) of the number of tasks per slot is `1`,
82+
which is better than the range of `3` under the default strategy shown in figure (d).
83+
84+
Thus, this ensures a more balanced distribution of the number of tasks across slots.
85+
86+
### The slots-to-TaskManagers assignment phase
87+
88+
As shown in figure (f), given a Flink job comprising two vertices, `JV-A` and `JV-B`, with parallelism of `6` and `3` respectively,
89+
and both vertices sharing the same slot sharing group.
90+
91+
{{< img src="/fig/deployments/tasks-scheduling/slots_to_taskmanagers_allocation_principle.svg" alt="The Slots to TaskManagers Allocation Principle Demo" class="offset" width="75%" >}}
92+
93+
The assignment result after the first phase is shown in figure (g),
94+
where `Slot0`, `Slot1`, and `Slot2` each contain `2` tasks, while the remaining slots contain `1` task each.
95+
96+
Subsequently:
97+
- The strategy submits all slot requests and waits until all slot resources required for the current job are ready.
98+
99+
Once the slot resources are ready:
100+
- The strategy then sorts all slot requests in descending order based on the number of tasks contained in each request.
101+
Afterward, it sequentially assigns each slot request to the `TaskManager` with the smallest current tasks loading.
102+
This process continues until all slot requests have been allocated.
103+
104+
The final assignment result is shown in figure (i), where each `TaskManager` ends up with exactly `3` tasks,
105+
resulting in a task count difference of `0` between `TaskManagers`. In contrast, the scheduling result under the default strategy,
106+
shown in figure (h), has a task count difference of `2` between `TaskManagers`.
107+
108+
Therefore, if you are seeing performance bottlenecks of the sort described above,
109+
then using this load balancing tasks scheduling strategy can improve performance.
110+
Be aware that you should not use this strategy, if you are not seeing these bottlenecks,
111+
as you may experience performance degradation.
112+
113+
## Usage
114+
115+
You can enable balanced tasks scheduling through the following configuration item:
116+
117+
- `taskmanager.load-balance.mode`: `tasks`
118+
119+
## More details
120+
121+
See the <a href="https://cwiki.apache.org/confluence/x/U56zDw">FLIP-370</a> for more details.
122+
123+
{{< top >}}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Tasks Scheduling
3+
bookCollapseSection: true
4+
weight: 9
5+
---
6+
<!--
7+
Licensed to the Apache Software Foundation (ASF) under one
8+
or more contributor license agreements. See the NOTICE file
9+
distributed with this work for additional information
10+
regarding copyright ownership. The ASF licenses this file
11+
to you under the Apache License, Version 2.0 (the
12+
"License"); you may not use this file except in compliance
13+
with the License. You may obtain a copy of the License at
14+
15+
http://www.apache.org/licenses/LICENSE-2.0
16+
17+
Unless required by applicable law or agreed to in writing,
18+
software distributed under the License is distributed on an
19+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
20+
KIND, either express or implied. See the License for the
21+
specific language governing permissions and limitations
22+
under the License.
23+
-->
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: Balanced Tasks Scheduling
3+
weight: 5
4+
type: docs
5+
6+
---
7+
<!--
8+
Licensed to the Apache Software Foundation (ASF) under one
9+
or more contributor license agreements. See the NOTICE file
10+
distributed with this work for additional information
11+
regarding copyright ownership. The ASF licenses this file
12+
to you under the Apache License, Version 2.0 (the
13+
"License"); you may not use this file except in compliance
14+
with the License. You may obtain a copy of the License at
15+
16+
http://www.apache.org/licenses/LICENSE-2.0
17+
18+
Unless required by applicable law or agreed to in writing,
19+
software distributed under the License is distributed on an
20+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
21+
KIND, either express or implied. See the License for the
22+
specific language governing permissions and limitations
23+
under the License.
24+
-->
25+
26+
# Balanced Tasks Scheduling
27+
28+
This page describes the background and principle of balanced tasks scheduling,
29+
how to use it when running streaming jobs.
30+
31+
## Background
32+
33+
When the parallelism of all vertices within a Flink streaming job is inconsistent,
34+
the [default strategy]({{< ref "docs/deployment/config" >}}#taskmanager-load-balance-mode)
35+
of Flink to deploy tasks sometimes leads some `TaskManagers` have more tasks while others have fewer tasks,
36+
resulting in excessive resource utilization at some `TaskManagers`
37+
that contain more tasks and becoming a bottleneck for the entire job processing.
38+
39+
{{< img src="/fig/deployments/tasks-scheduling/tasks_scheduling_skew_case.svg" alt="The Skew Case of Tasks Scheduling" class="offset" width="50%" >}}
40+
41+
As shown in figure (a), given a Flink job comprising two vertices, `JobVertex-A (JV-A)` and `JobVertex-B (JV-B)`,
42+
with parallelism degrees of `6` and `3` respectively,
43+
and both vertices sharing the same slot sharing group.
44+
Under the default tasks scheduling strategy, as illustrated in figure (b),
45+
the distribution of tasks across `TaskManagers` may result in significant disparities in task load.
46+
Specifically, the `TaskManager`s with the highest number of tasks may host `4` tasks,
47+
while the one with the lowest load may have only `2` tasks.
48+
Consequently, the `TaskManager`s bearing 4 tasks is prone to become a performance bottleneck for the entire job.
49+
50+
Therefore, Flink provides the task-quantity-based balanced tasks scheduling capability.
51+
Within the job's resource view, it aims to ensure that the number of tasks
52+
scheduled to each `TaskManager` as close as possible to,
53+
thereby improving the resource usage skew among `TaskManagers`.
54+
55+
<span class="label label-info">Note</span> The presence of inconsistent parallelism does not imply that this strategy must be used, as this is not always the case in practice.
56+
57+
## Principle
58+
59+
The task-quantity-based load balancing tasks scheduling strategy completes the assignment of tasks to `TaskManagers` in two phases:
60+
- The tasks-to-slots assignment phase
61+
- The slots-to-TaskManagers assignment phase
62+
63+
This section will use two examples to illustrate the simplified process and principle of
64+
how the task-quantity-based tasks scheduling strategy handles the assignments in these two phases.
65+
66+
### The tasks-to-slots assignment phase
67+
68+
Taking the job shown in figure (c) as an example, it contains five job vertices with parallelism degrees of `1`, `4`, `4`, `2`, and `3`, respectively.
69+
All five job vertices belong to the default slot sharing group.
70+
71+
{{< img src="/fig/deployments/tasks-scheduling/tasks_to_slots_allocation_principle.svg" alt="The Tasks To Slots Allocation Principle Demo" class="offset" width="65%" >}}
72+
73+
During the tasks-to-slots assignment phase, this tasks scheduling strategy:
74+
- First directly assigns the tasks of the vertices with the highest parallelism to the `i-th` slot.
75+
76+
That is, task `JV-Bi` is assigned directly to `sloti`, and task `JV-Ci` is assigned directly to `sloti`.
77+
78+
- Next, for tasks belonging to job vertices with sub-maximal parallelism, they are assigned in a round-robin fashion across the slots within the current
79+
slot sharing group until all tasks are allocated.
80+
81+
As shown in figure (e), under the task-quantity-based assignment strategy, the range (max-min difference) of the number of tasks per slot is `1`,
82+
which is better than the range of `3` under the default strategy shown in figure (d).
83+
84+
Thus, this ensures a more balanced distribution of the number of tasks across slots.
85+
86+
### The slots-to-TaskManagers assignment phase
87+
88+
As shown in figure (f), given a Flink job comprising two vertices, `JV-A` and `JV-B`, with parallelism of `6` and `3` respectively,
89+
and both vertices sharing the same slot sharing group.
90+
91+
{{< img src="/fig/deployments/tasks-scheduling/slots_to_taskmanagers_allocation_principle.svg" alt="The Slots to TaskManagers Allocation Principle Demo" class="offset" width="75%" >}}
92+
93+
The assignment result after the first phase is shown in figure (g),
94+
where `Slot0`, `Slot1`, and `Slot2` each contain `2` tasks, while the remaining slots contain `1` task each.
95+
96+
Subsequently:
97+
- The strategy submits all slot requests and waits until all slot resources required for the current job are ready.
98+
99+
Once the slot resources are ready:
100+
- The strategy then sorts all slot requests in descending order based on the number of tasks contained in each request.
101+
Afterward, it sequentially assigns each slot request to the `TaskManager` with the smallest current tasks loading.
102+
This process continues until all slot requests have been allocated.
103+
104+
The final assignment result is shown in figure (i), where each `TaskManager` ends up with exactly `3` tasks,
105+
resulting in a task count difference of `0` between `TaskManagers`. In contrast, the scheduling result under the default strategy,
106+
shown in figure (h), has a task count difference of `2` between `TaskManagers`.
107+
108+
Therefore, if you are seeing performance bottlenecks of the sort described above,
109+
then using this load balancing tasks scheduling strategy can improve performance.
110+
Be aware that you should not use this strategy, if you are not seeing these bottlenecks,
111+
as you may experience performance degradation.
112+
113+
## Usage
114+
115+
You can enable balanced tasks scheduling through the following configuration item:
116+
117+
- `taskmanager.load-balance.mode`: `tasks`
118+
119+
## More details
120+
121+
See the <a href="https://cwiki.apache.org/confluence/x/U56zDw">FLIP-370</a> for more details.
122+
123+
{{< top >}}

0 commit comments

Comments
 (0)