Skip to content

Dealing with {dead,live}locked applications #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Nemo157 opened this issue May 27, 2019 · 6 comments
Open

Dealing with {dead,live}locked applications #264

Nemo157 opened this issue May 27, 2019 · 6 comments
Labels
design Open design question

Comments

@Nemo157
Copy link
Contributor

Nemo157 commented May 27, 2019

This is mostly an application design issue, but maybe there are ways in which Tide can help users to design their application, or provide middleware that can detect/help resolve issues.

There's a few ways an application can get itself {dead,live}locked, e.g. deadlocking while locking multiple mutexes from the application State, panicking while holding a mutex such that all future requests instantly error with a PoisonError.

A common way to deal with this in todays landscape of service supervisors is to have a "health check" endpoint which verifies whether the service is currently usable, then a supervisor (such as Kubernetes) uses this endpoint to check if an instance of the service should be taken out of rotation and restarted. Tide could potentially provide pieces of a solution like this to make it easier for developers, or include examples of how such an endpoint could be designed to guide developers.

(split off from #263 (comment))

@prasannavl
Copy link
Contributor

I think it'd be really cool, if we could have it and be part of an even larger middleware that provides more common instrumentation, including custom ones (Similar to https://golang.org/pkg/expvar/)

@mehcode
Copy link
Contributor

mehcode commented Jun 2, 2019

Since we're talking about PoisonError and parking-lot, as a PSA, please look to futures::lock::Mutex. std::sync::Mutex is under the std::sync umbrella and is for synchronization between threads.

Usage becomes:

// in request
let the_thing = cx.state().the_thing.lock().await;

No PoisonError to be concerned with.


This is probably out of scope of Tide and more of something that Rust should be doing but IO-blocking behavior in an async fn can be a huge footgun (as its not always obvious). Perhaps some kind of instrumentation that debug rust builds can do to issue runtime warnings at least.

@Nemo157
Copy link
Contributor Author

Nemo157 commented Jun 3, 2019

No PoisonError to be concerned with.

Unless the waker panics when attempting to wake another task if we abandon acquiring the mutex, that will poison the std::sync:::Mutex protecting the list of wakers.

Much lower scope of poisoning since futures::lock::MutexGuard doesn't poison when dropped during a panic, but there are still possible situations that can poison a futures::lock::Mutex and result in live-locking all future requests that rely on it.

Probably would still be good to have a better example of live-locking that doesn't involve such unlikely situations/using incorrect synchronization primitives (futures::lock really could be futures::sync, it contains utiliites for synchronization between tasks).

@mehcode
Copy link
Contributor

mehcode commented Jun 3, 2019

Unless the waker panics when attempting to wake another task if we abandon acquiring the mutex, that will poison the std::sync:::Mutex protecting the list of wakers.

Fair enough. I had meant you don't need to deal with the potential error return but it seems its a panic instead.. not sure I like that. I didn't review the lock that closely.


Probably would still be good to have a better example [...]

Totally agree that good examples should be done. I do feel Mutexs in general (even using a futures-aware mutex) in async code is weird. You generally want a lockfree data structure if anything, state is bad (tm) and all that.


futures::lock really could be futures::sync, it contains utiliites for synchronization between tasks

It was in futures v0.1. Not sure the reasoning to rename the module in 0.3.

@yoshuawuyts
Copy link
Member

Notes from triage: this is not a priority at the moment, but we do think we'll eventually want to revisit this. We're keeping the issue open to track the discussion.

@bIgBV bIgBV added the design Open design question label Jul 25, 2019
@skade
Copy link
Contributor

skade commented Nov 5, 2019

While I agree with the general notion, I believe the application should not continue running and wait for an external restart. It should immediately crash, letting a supervisor restart it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Open design question
Projects
None yet
Development

No branches or pull requests

6 participants