Description
Context:
- feat(rust): add second example using non-op internal extension, expand README aws-samples/graceful-shutdown-with-aws-lambda#19 (comment) - discussion where it was suggested
- feat(tracing, lambda-runtime): add support for custom writer with default tracing subscriber, add turnkey graceful shutdown helper behind 'graceful-shutdown' feature flag #982 - related PR that includes graceful shutdown example
It would be nice to have a simple turnkey API to enable users to handle graceful shutdowns properly in their functions.
Essentially we want to make it really easy to do something like the below, without reasoning about extensions, selecting across signals, etc. Just move into a closure whatever resources you need to clean up and invoke whatever logic.
let log_guard = tracing_appender::NonBlocking::new(std::io::stdout);
tokio::spawn(async move {
let mut sigint = signal(SignalKind::interrupt()).unwrap();
let mut sigterm = signal(SignalKind::terminate()).unwrap();
tokio::select! {
_sigint = sigint.recv() => {
tracing::info!("[runtime] SIGINT received");
tracing::info!("[runtime] Graceful shutdown in progress ...");
std::mem::drop(log_guard);
eprintln!("[runtime] Graceful shutdown completed");
std::process::exit(0);
},
_sigterm = sigterm.recv()=> {
tracing::info!("[runtime] SIGTERM received");
tracing::info!("[runtime] Graceful shutdown in progress ...");
std::mem::drop(log_guard);
eprintln!("[runtime] Graceful shutdown completed");
std::process::exit(0);
},
}
});
This requires two parts:
- An API that accepts an async closure that can be invoked in case a
SIGTERM
orSIGINT
is caught - Under the hood handling to ensure that graceful shutdown signals are sent (meaning, an extension is registered)
A few questions to answer with some initial thoughts:
Should the user-specified closure be able to differentiate between SIGTERM
or SIGINT
/ have some sort of input parameter?
I think probably no, we should keep it on rails and avoid surfacing types, maybe throw in an implicit tracing::debug
log if tracing feature is enabled. Anyway the caller can always specify their signal handling directly if they need more customizability.
Should we just always specify a no-op internal extension, or do we attempt to avoid it based on detection of other registered extensions, caller specifying, etc
I think probably we just always implicitly add a no-op extension and keep the API simple. We can get away with this because of two things:
- we can simply not subscribe to any events in the no-op extension
- the
next event
extension API takes the form of a long poll that just hangs indefinitely if no subscribed events are enabled (ie, the extension client connects to lambda orchestrator, and then just waits for data that never comes)
This means that the no-op extension is very cheap - we can just spawn a task that drives its future, but is never woken after initial connect (or very rarely woken, if for some reason the long-running connection is interrupted).
As long as we document some special cased extension name that the caller shouldn't collide with, I don't see much downside to just always initializing it. We could probably do some smart detection of whether other extensions are registered, right before starting the function handler, but it strikes me as needless complexity. Same thing goes for explicit 'opt out of no-op extension' api facing users.
Can we assume we are in a tokio runtime?
Not sure... This matters because the previously mentioned approach involves spawning a background task. Tokio with rt
feature looks like a hard dependency of the runtime currently. If that's the case, I guess we can also just document that it will panic if called outside a tokio runtime (and that would happen on startup so it would be obvious). It would be more awkward if we had to bring in a tokio feature flag or something. I need to poke around more.