-
Notifications
You must be signed in to change notification settings - Fork 327
Dealing with panics #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Nemo157 I like the idea of doing a basic thing where we terminate all connections with a 500. I'm a big fan of the panic = abort model, where if we're panicking it means something has gone terminally wrong, and we need to reboot. I'm a bit hesitant for any more elaborate schemes though. Especially if there's a risk we might hang while we try and return the Re: poison errors -- parking-lot provides a way around poison errors specifically, but I'm not sure what to do about {dead,live}locks in general. I think your intuition is right, and we should have a way to provide status checks to any supervisor process so they can judge the status. But this indeed feels like a different topic than "dealing with panics". Perhaps we should open a separate issue on this? |
I brought up the poison error because that is a knockon effect from panic unwinding. I don't know if there are other commonly used data structures that you might store in (Opened #264 for more specifics on {dead,live}locked applications). |
@Nemo157 - Are we talking about panics below the tide layer? (as in hyper, networking layers, etc), or above tide layers? (I think below tide they are fairly self contained, and represent more serious ones which probably should and would crash). One of the things I've been experimenting with for quite some time is to have middlewares (and basically everything) return With this ideology, someone who wants different behaviour can easily handle panics manually in higher middleware layers, or even simply swap out the default error handler middleware to their own liking. Short version - I don't think we should be opinionated here at all, other than providing a default behaviour as a part of the middleware stack that's easily replace-able - this is application specific domain, and I don't think tide should interfere. |
Above tide, in middleware or endpoints. In my case I was part way through implementing a feature so hit an |
@Nemo157 - I think a strategy similar to ASP .NET Core, or most javascript frameworks is what I'd proposing here. Have multiple preset middlewares in the beginning - One exception handler (which would become the panic handler doing Also if there's a way to get the panic profile of the compilation during runtime, we can simply use that and preset the handler when the panic mode is set to Most applications and supervisor (kubernetes, swarm) do health checks anyway, that detect 500s and restart the application. Applications could be running other logic that's not just a pure server, or even proper cleanups to shutdown the application in case of multiple failures. So, I don't think tide should crash the entire application, unless that's the behaviour chosen by the application. Personally, when I'm on a server loop - I except the server to never crash other than very serious memory allocation, or corruption errors. Here in your example instance, granted State is corrupted - but it's still upto the application to determine how much of the application logic is affected by this. For instance there could be completely irrelevant endpoint that does other things that don't depend on the state, or even one that does - for instance, an endpoint I use to remotely examine the corrupted state. So, I think this always has to be the application's choice - if we let the server crash, this effectively no longer becomes a choice, really. |
I can open a PR for my middleware that transforms panics into 500s (which can easily change to just return the panic as an error if middleware starts returning results) and take a look at whether it's feasible to implement a middleware that stops the application (I have an idea of how to do it as a side-channel so that it will work with |
@Nemo157 - do we really need to stop the application? I mean, isn't this logic best left at one place -- the place being the future's runtime? Tokio threadpool for instance has a panic handler, (and the current thread runtime bubbles up if I'm right?) and I suspect Should this decision making be going into tide - unless of course it's purely contained in the middleware layer - in which case any and all things can be explored at will. |
I want to see exactly what is possible in this space with the current design. Whether there are any issues with proxying a panic back from a middleware into the context in which the My goal is just to make sure that there is some way to implement any sane behaviour that a user might want, not to change what Tide itself does by default. |
Ah, that sounds pretty cool! I'm keen to explore this as well. Currently I do ugly workaround for clean shutdown with a custom hyper backend using hyper's graceful shutdown channel - by manually connecting them to SIGINT/SIGTERM. But one other alternative I was thinking of experimenting with instead of http-rs/http-service#11 was to do that in one of the first middleware - makes it agnostic of the backend or http-service but will end up with redundant checks on the hotpath. |
Random thought I just had related to this: Rather than internally attempting to handle panics and return a 500 Tide (or more likely the default |
@Nemo157 - Sounds like a good case for an additional model to experiment with, rather than a default replacement for what do when a panic happens in the current process, current thread. If the handling can abstracted to the point where both are provided in the box and either of these can be chosen as easily as a simple middleware add, I think that's a good story for tide. However focusing in on one may allow additional optimizations like handing over the socket instead of proxy-ing (or |
Has the |
Where is |
Currently Tide doesn't provide any way to deal with panics in handlers, these just unwind up to the executor and (at least in the case of Tokio) kill the executors thread. During the unwinding the TCP connection is dropped and the client gets something like
The connection was reset
. The application as a whole just keeps running (I'm not sure what happens with other requests currently being handled on the same worker thread).My first thought on dealing with this is having a middleware that uses
catch_unwind
to catch the panic and return a minimal 500 to the client. (I have prototyped this and it works very easily).The other part of this is that when I encountered it I had locked a
Mutex
stored in theState
, this subsequently started returningPoisonError
on access so my application was effectively dead. Maybe when a panic is encountered it should be proxied through to the future returned byserve
and resumed on the main thread to try and kill the entire application. Alternatively this could be dealt with at the application level using "health checks" with support from a service coordinator (like Kubernetes). Since it depends on usecase it may make sense to provide middleware supporting different options and allow the user to choose.The text was updated successfully, but these errors were encountered: