Skip to content

Conversation

@RocMarshal
Copy link
Contributor

What is the purpose of the change

[FLINK-33392][docs] Add the documentation pages for balanced tasks scheduling.

Brief change log

[FLINK-33392][docs] Add the documentation pages for balanced tasks scheduling.

Verifying this change

N.A

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Oct 18, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@RocMarshal RocMarshal marked this pull request as draft October 19, 2025 00:49
@RocMarshal RocMarshal force-pushed the FLINK-33392 branch 2 times, most recently from 56086b3 to 583dd4b Compare October 19, 2025 15:46
@RocMarshal
Copy link
Contributor Author

The CH edition page will added a copy in the corresponding location after the EN edition page is ready.

@RocMarshal RocMarshal marked this pull request as ready for review October 20, 2025 01:41
@RocMarshal RocMarshal requested a review from davidradl October 21, 2025 15:35
@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Oct 21, 2025

## Background

When the parallelism of all vertices within a Flink streaming job is inconsistent,
Copy link
Contributor

@davidradl davidradl Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the situation this help in is when the parallelism of a vertex is less in the subsequent vertex in the job. Not when parallelism is inconsistent (which could mean that we have 2 kafka sources with different number of partitions - I guess this change would not effect them). Could this be a badly configured job, if not it would be useful to detail when this can occur.

Are there other topologies that this effects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which could mean that we have 2 kafka sources with different number of partitions - I guess this change would not effect them

This algorithm only focuses on balancing the number of tasks, and specific cases require individual discussion.

Could this be a badly configured job, if not it would be useful to detail when this can occur.

The business scenarios faced by Flink jobs are complex, so whether the configuration is reasonable depends on whether the job's performance meets the throughput requirements for processing business data.
Therefore, we cannot simply discuss whether the parallelism is properly configured.

Are there other topologies that this effects?

No, its are not.

CC @davidradl

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense: I would expand a bit around inconsistent, mentioning parallelism. The worry is that it could be read that inconsistent parallelism requires the use of this strategy, which is not always the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidradl
Precise!
What about add a note item content like follows?

<span class="label label-info">Note</span> The presence of inconsistent parallelism does not imply that this strategy must be used, as this is not always the case in practice.

resulting in a task count difference of `0` between `TaskManagers`. In contrast, the scheduling result under the default strategy,
shown in figure (h), has a task count difference of `2` between `TaskManagers`.

Therefore, theoretically, using this load balancing tasks scheduling strategy could effectively mitigate the issue of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would remove the words "theoretically" "effectively" - I think it would make the statement stronger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davidradl

I removed the 'effectively' and still kept 'theoretically' for rigor.
After all, each job has unique business characteristics and load patterns, making it difficult to guarantee that no scenario exists where a flink job theoretically fits but actually causes performance degradation when enabled. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the theoretically, and be explicit about the considerations. Maybe:

If you are seeing performance bottle necks of the sort described above, then using this load balancing tasks scheduling strategy can improve performance.
Be aware that you should not use this strategy, if you are not seeing these bottle necks, as you may experience performance degradation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidradl
Thanks

Updated as:

Therefore, if you are seeing performance bottlenecks of the sort described above,
then using this load balancing tasks scheduling strategy can improve performance.
Be aware that you should not use this strategy, if you are not seeing these bottlenecks,
as you may experience performance degradation.

@RocMarshal RocMarshal force-pushed the FLINK-33392 branch 2 times, most recently from a8d7f71 to f5586b0 Compare November 6, 2025 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants