Dealing with {dead,live}locked applications #264

Nemo157 · 2019-05-27T11:41:51Z

This is mostly an application design issue, but maybe there are ways in which Tide can help users to design their application, or provide middleware that can detect/help resolve issues.

There's a few ways an application can get itself {dead,live}locked, e.g. deadlocking while locking multiple mutexes from the application State, panicking while holding a mutex such that all future requests instantly error with a PoisonError.

A common way to deal with this in todays landscape of service supervisors is to have a "health check" endpoint which verifies whether the service is currently usable, then a supervisor (such as Kubernetes) uses this endpoint to check if an instance of the service should be taken out of rotation and restarted. Tide could potentially provide pieces of a solution like this to make it easier for developers, or include examples of how such an endpoint could be designed to guide developers.

(split off from #263 (comment))

The text was updated successfully, but these errors were encountered:

prasannavl · 2019-05-27T13:51:03Z

I think it'd be really cool, if we could have it and be part of an even larger middleware that provides more common instrumentation, including custom ones (Similar to https://golang.org/pkg/expvar/)

mehcode · 2019-06-02T06:55:58Z

Since we're talking about PoisonError and parking-lot, as a PSA, please look to futures::lock::Mutex. std::sync::Mutex is under the std::sync umbrella and is for synchronization between threads.

Usage becomes:

// in request
let the_thing = cx.state().the_thing.lock().await;

No PoisonError to be concerned with.

This is probably out of scope of Tide and more of something that Rust should be doing but IO-blocking behavior in an async fn can be a huge footgun (as its not always obvious). Perhaps some kind of instrumentation that debug rust builds can do to issue runtime warnings at least.

Nemo157 · 2019-06-03T08:20:17Z

No PoisonError to be concerned with.

Unless the waker panics when attempting to wake another task if we abandon acquiring the mutex, that will poison the std::sync:::Mutex protecting the list of wakers.

Much lower scope of poisoning since futures::lock::MutexGuard doesn't poison when dropped during a panic, but there are still possible situations that can poison a futures::lock::Mutex and result in live-locking all future requests that rely on it.

Probably would still be good to have a better example of live-locking that doesn't involve such unlikely situations/using incorrect synchronization primitives (futures::lock really could be futures::sync, it contains utiliites for synchronization between tasks).

mehcode · 2019-06-03T09:22:54Z

Unless the waker panics when attempting to wake another task if we abandon acquiring the mutex, that will poison the std::sync:::Mutex protecting the list of wakers.

Fair enough. I had meant you don't need to deal with the potential error return but it seems its a panic instead.. not sure I like that. I didn't review the lock that closely.

Probably would still be good to have a better example [...]

Totally agree that good examples should be done. I do feel Mutexs in general (even using a futures-aware mutex) in async code is weird. You generally want a lockfree data structure if anything, state is bad (tm) and all that.

futures::lock really could be futures::sync, it contains utiliites for synchronization between tasks

It was in futures v0.1. Not sure the reasoning to rename the module in 0.3.

yoshuawuyts · 2019-07-25T16:55:26Z

Notes from triage: this is not a priority at the moment, but we do think we'll eventually want to revisit this. We're keeping the issue open to track the discussion.

skade · 2019-11-05T15:03:41Z

While I agree with the general notion, I believe the application should not continue running and wait for an external restart. It should immediately crash, letting a supervisor restart it.

Nemo157 mentioned this issue May 27, 2019

Dealing with panics #263

Open

bIgBV added the design Open design question label Jul 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dealing with {dead,live}locked applications #264

Dealing with {dead,live}locked applications #264

Nemo157 commented May 27, 2019 •

edited

Loading

prasannavl commented May 27, 2019

Uh oh!

mehcode commented Jun 2, 2019

Uh oh!

Nemo157 commented Jun 3, 2019

Uh oh!

mehcode commented Jun 3, 2019

Uh oh!

yoshuawuyts commented Jul 25, 2019

Uh oh!

skade commented Nov 5, 2019

Uh oh!

Dealing with {dead,live}locked applications #264

Dealing with {dead,live}locked applications #264

Comments

Nemo157 commented May 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

prasannavl commented May 27, 2019

Uh oh!

mehcode commented Jun 2, 2019

Uh oh!

Nemo157 commented Jun 3, 2019

Uh oh!

mehcode commented Jun 3, 2019

Uh oh!

yoshuawuyts commented Jul 25, 2019

Uh oh!

skade commented Nov 5, 2019

Uh oh!

Nemo157 commented May 27, 2019 •

edited

Loading