Skip to content

engine: add input grace period and check pending chunks on shutdown #9952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

singholt
Copy link
Contributor

@singholt singholt commented Feb 17, 2025

What does this PR do?

This PR makes the following changes:

  1. Add an input grace period:

Currently, Fluent Bit pauses all inputs 1 second after SIGTERM. This PR creates an input grace period, which by default is half the total "Grace" setting. This means that half way through the grace period Fluent Bit stops accepting any new logs and only sends logs pending in the buffers.

  1. Check pending chunks on shutdown:

Previously the engine shutdown immediately if there were no pending tasks. A task is created from a chunk in the buffer. If there is a new chunk, but no task yet, the engine should keep running until the task is created and completed. This change makes the engine wait on shutdown for all pending chunks until the max grace period has expired.

What use-case does this PR aim to solve?

In production environments with high-throughput logging, applications can generate significant volumes of logs even during the shutdown phase. Container orchestration services, such as Amazon ECS, provide containers with a (configurable) graceful shutdown period (default is 30 seconds in ECS) to properly terminate their operations. However, the current implementation may lead to dropped logs during this shutdown process, as it immediately stops accepting inputs after SIGTERM and may not process all buffered data.

By introducing an input grace period and improving the pending chunk verification, Fluent Bit can now better utilize the provided shutdown window - continuing to accept critical logs for a portion of the grace period while ensuring all buffered data is properly processed and delivered to their destinations. This results in more meaningful use of the shutdown time rather than simply discarding unprocessed input.

These improvements are also valuable when applications perform controlled shutdowns due to conditions like OOM or health check failures - capturing crucial diagnostic logs during the application's final moments.


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@singholt singholt changed the title [wip] do not review yet [wip] engine: add input grace period and check pending chunks on shutdown Feb 19, 2025
@singholt singholt marked this pull request as ready for review February 19, 2025 21:42
@singholt singholt changed the title [wip] engine: add input grace period and check pending chunks on shutdown engine: add input grace period and check pending chunks on shutdown Feb 19, 2025
@singholt
Copy link
Contributor Author

Hi @edsiper / @leonardo-albertovich could you please review this PR and provide your feedback? thanks!

Copy link
Member

@edsiper edsiper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments added

PettitWesley and others added 3 commits May 19, 2025 09:24
1. Input grace period

Currently, Fluent Bit pauses all inputs 1 second after SIGTERM. This change creates an input grace period, which by default is one half the total Grace setting.
This means that half way through the grace period Fluent Bit stops accepting any new logs and only sends logs pending in the buffers.

2. Check pending chunks on shutdown

Previously the engine shutdown immediately if there were no pending tasks. A task is created from a chunk in the buffer.
If there is a new chunk, but no task yet, the engine should keep running until the task is created and completed.

This change makes the engine wait on shutdown for all pending chunks until the max grace period has expired.

Signed-off-by: Wesley Pettit <[email protected]>
Co-authored-by: Anuj Singh <[email protected]>
@singholt singholt force-pushed the engine-grace-input branch from ccbcbf8 to 77d61d1 Compare May 19, 2025 16:25
singholt added a commit to singholt/fluent-bit-docs that referenced this pull request May 19, 2025
singholt added a commit to singholt/fluent-bit-docs that referenced this pull request May 19, 2025
@singholt
Copy link
Contributor Author

Docs PR: fluent/fluent-bit-docs#1667

@leonardo-albertovich
Copy link
Collaborator

I'll review this PR tomorrow.

@leonardo-albertovich leonardo-albertovich self-assigned this May 19, 2025
Signed-off-by: Anuj Singh <[email protected]>
Co-authored-by: Wesley Pettit <[email protected]>
@leonardo-albertovich
Copy link
Collaborator

Are you done making changes @singholt? I want to review this today but only if it's the final code.

@singholt
Copy link
Contributor Author

singholt commented May 20, 2025

Are you done making changes @singholt? I want to review this today but only if it's the final code.

Yes, just fixed the compilation error CI caught! Its ready for your review.

@singholt
Copy link
Contributor Author

@leonardo-albertovich PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants