-
Notifications
You must be signed in to change notification settings - Fork 981
Description
What would you like to be added:
When creating a graceful eviction task for the application failover, we should check if there is already an existing graceful eviction task from a recent failover, and conserve the state if the application never transitioned to a healthy state.
Why is this needed:
We currently use the state preservation feature to ensure flink applications can gracefully resume in the case that they experience an application or cluster level failure. This is dependent on the correct jobID being conserved.
However, there could be cases in which an application is failed over to another cluster, but the application does not transition to healthy within the tolerationSeconds. If this happens, Karmada will attempt to trigger another failover, fetching the new jobID for the flink job, and causing the fetching of the latest state to fail (since no state has been saved for that jobID).
Ideally, we should copy the state that was conserved by the previous graceful eviction task, thereby conserving the previous state until the application can transition to a healthy state.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status