-
Notifications
You must be signed in to change notification settings - Fork 1.7k
engine: add input grace period and check pending chunks on shutdown #9952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
76c983b
to
fb13aeb
Compare
Hi @edsiper / @leonardo-albertovich could you please review this PR and provide your feedback? thanks! |
fb13aeb
to
193394f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments added
193394f
to
80388ef
Compare
80388ef
to
ccbcbf8
Compare
1. Input grace period Currently, Fluent Bit pauses all inputs 1 second after SIGTERM. This change creates an input grace period, which by default is one half the total Grace setting. This means that half way through the grace period Fluent Bit stops accepting any new logs and only sends logs pending in the buffers. 2. Check pending chunks on shutdown Previously the engine shutdown immediately if there were no pending tasks. A task is created from a chunk in the buffer. If there is a new chunk, but no task yet, the engine should keep running until the task is created and completed. This change makes the engine wait on shutdown for all pending chunks until the max grace period has expired. Signed-off-by: Wesley Pettit <[email protected]> Co-authored-by: Anuj Singh <[email protected]>
Signed-off-by: Wesley Pettit <[email protected]>
…ault: off) Signed-off-by: Anuj Singh <[email protected]>
ccbcbf8
to
77d61d1
Compare
See corresponding PR: fluent/fluent-bit#9952 Signed-off-by: Anuj Singh <[email protected]>
See corresponding PR: fluent/fluent-bit#9952 Signed-off-by: Anuj Singh <[email protected]>
Docs PR: fluent/fluent-bit-docs#1667 |
I'll review this PR tomorrow. |
Signed-off-by: Anuj Singh <[email protected]> Co-authored-by: Wesley Pettit <[email protected]>
77d61d1
to
38850e2
Compare
Are you done making changes @singholt? I want to review this today but only if it's the final code. |
Yes, just fixed the compilation error CI caught! Its ready for your review. |
What does this PR do?
This PR makes the following changes:
Currently, Fluent Bit pauses all inputs 1 second after
SIGTERM
. This PR creates an input grace period, which by default is half the total "Grace" setting. This means that half way through the grace period Fluent Bit stops accepting any new logs and only sends logs pending in the buffers.Previously the engine shutdown immediately if there were no pending tasks. A task is created from a chunk in the buffer. If there is a new chunk, but no task yet, the engine should keep running until the task is created and completed. This change makes the engine wait on shutdown for all pending chunks until the max grace period has expired.
What use-case does this PR aim to solve?
In production environments with high-throughput logging, applications can generate significant volumes of logs even during the shutdown phase. Container orchestration services, such as Amazon ECS, provide containers with a (configurable) graceful shutdown period (default is 30 seconds in ECS) to properly terminate their operations. However, the current implementation may lead to dropped logs during this shutdown process, as it immediately stops accepting inputs after
SIGTERM
and may not process all buffered data.By introducing an input grace period and improving the pending chunk verification, Fluent Bit can now better utilize the provided shutdown window - continuing to accept critical logs for a portion of the grace period while ensuring all buffered data is properly processed and delivered to their destinations. This results in more meaningful use of the shutdown time rather than simply discarding unprocessed input.
These improvements are also valuable when applications perform controlled shutdowns due to conditions like OOM or health check failures - capturing crucial diagnostic logs during the application's final moments.
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.