Fail partition processor on scheduler errors#4952
Conversation
Summary: Make sure that partition processor fails when facing scheduler errors
tillrohrmann
left a comment
There was a problem hiding this comment.
Thanks for creating this PR @muhamadazmy. Can we ensure that the PP fails if there is a scheduler error?
| } | ||
| } | ||
| let result = scheduler.schedule_next(vqueue_metas).await; | ||
| Some((ActionEffect::Scheduler(result), scheduler)) |
There was a problem hiding this comment.
Instead of forwarding the error and logging it at a different place, can we let run fail if we encounter a scheduler error? Then the pp should fail as well.
There was a problem hiding this comment.
The PP still fails on applying the action effects! I don't only log the error.
I was aiming first on failing run() directly, but it wasn't possible cleanly without too much changes and some allocations as well.
Instead, I emitted the scheduler decision result, then fail while applying the action. Which still fails the PP
There was a problem hiding this comment.
One of the reasons i didn't just poll the scheduler in the select! block is cancellation safety, since all_streams is created on the stack on each call to run() racing with the scheduler will cause loss of already polled effects.
All other branches in the select block return with error this is why it's okay now.
There was a problem hiding this comment.
Sorry, I missed the part wrt to propagating the error. This should be fine.
What I don't fully understand is the cancellation safety that we get with the stream_select! vs polling the scheduler directly in the select! statement. Could you help me understand?
There was a problem hiding this comment.
I was mistaken, the ready_chunk returns immediately if it collected some items and remaining streams returns pending. so there is no fear of loss
|
@tillrohrmann please let me know if we can merge this for v1.7 or not? |
Summary:
Make sure that partition processor fails when
facing scheduler errors