-
Notifications
You must be signed in to change notification settings - Fork 1.9k
engine: fix threaded input data loss during shutdown #11337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add flb_ring_buffer_get_used() to retrieve the current used size of a ring buffer. This is needed to check pending data during shutdown sequence. Signed-off-by: jinyong.choi <[email protected]>
📝 WalkthroughWalkthroughAdds APIs and runtime checks to expose ring-buffer usage and input-thread pause state, integrates ring-buffer collection into engine startup/shutdown, and prevents resume while shutting down so pending ring-buffer data is flushed before exit. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Engine
participant ThreadMgr as "Input Threads"
participant RBCollector as "RB Collector"
participant RingBuf as "Ring Buffers"
Engine->>Engine: rb_size = flb_input_chunk_get_total_ring_buffer_size(config)
alt rb_size > 0
Engine->>RBCollector: flb_input_chunk_ring_buffer_collector(config, NULL)
RBCollector->>RingBuf: drain ring buffers -> flush to chunks
RBCollector-->>Engine: collection triggered
end
Engine->>ThreadMgr: all_threaded_inputs_paused()? (check is_paused)
alt not all paused
ThreadMgr-->>Engine: some inputs not paused
Engine->>Engine: increment pending work / wait
else all paused
ThreadMgr-->>Engine: all paused
end
Engine->>Engine: proceed to finalize when pending==0
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧠 Learnings (4)📚 Learning: 2025-08-29T06:25:02.561ZApplied to files:
📚 Learning: 2025-08-31T12:46:11.940ZApplied to files:
📚 Learning: 2025-08-29T06:25:27.250ZApplied to files:
📚 Learning: 2025-11-21T06:23:29.770ZApplied to files:
🧬 Code graph analysis (2)src/flb_input_chunk.c (3)
src/flb_engine.c (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (31)
🔇 Additional comments (7)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6f4b64b0e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @include/fluent-bit/flb_input_thread.h:
- Around line 92-98: Replace the volatile flag with a C11 atomic type so thread
reads/writes have proper synchronization: change the declaration of is_paused
from "volatile int is_paused;" to an atomic integer (e.g., "_Atomic int
is_paused;" or "atomic_int is_paused;") and add the required include
(<stdatomic.h>) where this header or its implementation uses the flag; update
any direct assignments/reads in the input thread and main thread to use plain
loads/stores (or atomic_store/atomic_load) as appropriate to preserve the same
semantics.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
include/fluent-bit/flb_input_chunk.hinclude/fluent-bit/flb_input_thread.hinclude/fluent-bit/flb_ring_buffer.hsrc/flb_engine.csrc/flb_input_chunk.csrc/flb_input_thread.csrc/flb_ring_buffer.c
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/flb_input_chunk.hsrc/flb_engine.cinclude/fluent-bit/flb_ring_buffer.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
src/flb_engine.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
src/flb_engine.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
src/flb_engine.c
🧬 Code graph analysis (4)
include/fluent-bit/flb_input_chunk.h (1)
src/flb_input_chunk.c (1)
flb_input_chunk_total_ring_buffers_size(3310-3324)
src/flb_engine.c (1)
src/flb_input_chunk.c (2)
flb_input_chunk_total_ring_buffers_size(3310-3324)flb_input_chunk_ring_buffer_collector(3061-3111)
include/fluent-bit/flb_ring_buffer.h (1)
src/flb_ring_buffer.c (1)
flb_ring_buffer_get_used(205-208)
src/flb_input_chunk.c (3)
include/fluent-bit/flb_input.h (1)
flb_input_buf_paused(705-715)src/flb_input.c (1)
flb_input_name(790-797)src/flb_ring_buffer.c (1)
flb_ring_buffer_get_used(205-208)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (32)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: Agent
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COMPILER_STRICT_POINTER_TYPES=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-without-cxx (3.31.6)
- GitHub Check: pr-compile-centos-7
- GitHub Check: PR - fuzzing test
🔇 Additional comments (12)
src/flb_ring_buffer.c (1)
205-208: LGTM - Consistent with existing API style.The function is a straightforward wrapper around
lwrb_get_full(). The lack of NULL checks is consistent with other functions in this file (flb_ring_buffer_write,flb_ring_buffer_read), indicating that callers are expected to pass valid pointers.src/flb_input_thread.c (2)
81-84: LGTM - Proper pause state tracking.The
is_pausedflag is set after the pause callback completes, correctly signaling to the main thread that pause processing is finished. The conditional check protects against accessingthiwhen it's not available.
90-93: LGTM - Proper resume state tracking.The
is_pausedflag is cleared after the resume callback completes, maintaining symmetry with the pause handler. The defensive checks are appropriate.include/fluent-bit/flb_input_chunk.h (1)
165-165: LGTM - New API for total ring buffer size tracking.The function signature is clear and the implementation (seen in src/flb_input_chunk.c:3309-3323) correctly sums the used space across all threaded inputs' ring buffers.
include/fluent-bit/flb_ring_buffer.h (1)
43-43: LGTM - New API exposes ring buffer usage.The function provides a clean interface to query the amount of used space in the ring buffer, essential for the shutdown coordination logic. The implementation delegates to the underlying
lwrblibrary.src/flb_input_chunk.c (4)
2416-2426: LGTM! Correct guard to prevent input resume during shutdown.The
is_shutting_downcheck correctly prevents inputs from being resumed during the graceful shutdown phase, ensuring the engine can properly drain ring buffers without new data being ingested.
2672-2683: LGTM! Critical fix for data loss prevention during shutdown.The conditional bypass of the pause check when
is_shutting_down == FLB_TRUEcorrectly allows pending ring buffer data to be flushed to chunks during graceful shutdown. The logic is sound: normal operation respects backpressure, while shutdown prioritizes data preservation.
3074-3084: LGTM! Consistent shutdown bypass logic.This mirrors the pause-bypass logic in
input_chunk_append_raw, ensuring the ring buffer collector can fully drain all pending data during shutdown regardless of pause state. The symmetry between producer and collector is correct.
3305-3324: LGTM! Clean implementation of ring buffer size aggregation.The function correctly:
- Guards against non-threaded inputs and null ring buffers
- Aggregates used space across all threaded input instances
- Returns appropriate size_t type
The underlying
lwrb_get_fullshould be safe for single-reader scenarios (main engine thread), which matches the usage pattern here.src/flb_engine.c (3)
36-40: LGTM! Required includes for new functionality.Both headers are necessary:
flb_input_thread.hfor accessing theis_pausedflag in threaded input instancesflb_input_chunk.hforflb_input_chunk_total_ring_buffers_sizeandflb_input_chunk_ring_buffer_collector
805-822: LGTM! Proper helper for threaded input pause verification.The function correctly checks all threaded input instances to verify they have acknowledged the pause signal. The defensive check for both
is_threadedandthibefore accessingis_pausedis appropriate.
1200-1213: Well-structured shutdown sequence for ring buffer draining.The logic correctly:
- Includes ring buffer presence in pending work calculation (
rb_size > 0adds 1)- Actively drains ring buffers each shutdown tick when data is pending
- Only checks thread pause completion after all other work is done (tasks=0, chunks=0, ring buffer empty)
- Continues waiting if threaded inputs haven't fully acknowledged pause
This ensures data isn't lost during shutdown by waiting for both ring buffer drainage and thread coordination.
|
|
||
| /* | ||
| * Pause state flag for shutdown synchronization. | ||
| * Set to 1 when thread completes pause processing. | ||
| * Checked by main thread to ensure safe shutdown. | ||
| */ | ||
| volatile int is_paused; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the codebase uses atomic operations elsewhere for similar coordination patterns
rg -n -C3 --type=c '\b_Atomic\b|atomic_|__atomic_' --glob '!**/tests/**' --glob '!**/examples/**'Repository: fluent/fluent-bit
Length of output: 50374
🏁 Script executed:
#!/bin/bash
# Find usage of is_paused variable
rg -n 'is_paused' --type=c -B2 -A2Repository: fluent/fluent-bit
Length of output: 3871
🏁 Script executed:
#!/bin/bash
# Check context of is_paused usage in flb_input_thread.c
sed -n '75,100p' src/flb_input_thread.c
# Check context in flb_engine.c
sed -n '810,825p' src/flb_engine.cRepository: fluent/fluent-bit
Length of output: 1382
🏁 Script executed:
#!/bin/bash
# Check the entire flb_engine_shutdown_timeout function
sed -n '800,830p' src/flb_engine.c
# Check what calls the function where is_paused is written
sed -n '50,120p' src/flb_input_thread.cRepository: fluent/fluent-bit
Length of output: 2712
🏁 Script executed:
#!/bin/bash
# Check if is_paused is accessed in any signal handlers
rg -n 'signal|SIGTERM|SIGINT' src/flb_input_thread.c src/flb_engine.c -B2 -A2 | head -50
# Check the structure definition to understand the context better
sed -n '50,100p' include/fluent-bit/flb_input_thread.hRepository: fluent/fluent-bit
Length of output: 4721
🏁 Script executed:
#!/bin/bash
# Check how the pause operation is triggered (from which thread context)
rg -n 'FLB_INPUT_THREAD_PAUSE' --type=c -B5 -A5
# Check if there's any locking around the is_paused read
sed -n '805,825p' src/flb_engine.cRepository: fluent/fluent-bit
Length of output: 2956
Consider using C11 _Atomic for thread-safe flag access.
The volatile qualifier ensures visibility but lacks atomicity guarantees per the C standard. In this thread-to-thread coordination pattern, is_paused is written by the input thread and read by the main thread during shutdown without explicit locking. While volatile prevents compiler optimizations from caching the value, atomic operations (e.g., _Atomic int or platform-specific atomics) would provide explicit synchronization semantics and better portability guarantees. This aligns with the codebase's use of sig_atomic_t for similar shutdown coordination patterns elsewhere.
🤖 Prompt for AI Agents
In @include/fluent-bit/flb_input_thread.h around lines 92-98, Replace the
volatile flag with a C11 atomic type so thread reads/writes have proper
synchronization: change the declaration of is_paused from "volatile int
is_paused;" to an atomic integer (e.g., "_Atomic int is_paused;" or "atomic_int
is_paused;") and add the required include (<stdatomic.h>) where this header or
its implementation uses the flag; update any direct assignments/reads in the
input thread and main thread to use plain loads/stores (or
atomic_store/atomic_load) as appropriate to preserve the same semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kept volatile int to maintain consistency with existing thread coordination flags in this codebase (e.g., is_shutting_down in flb_config.h). The single-direction write pattern with periodic polling makes this sufficient for the current use case. If there's a preference for stronger atomics with platform abstraction, I'm happy to update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the declaration of is_shutting_down in flb_config.h to verify the user's claim
rg -n 'is_shutting_down' --type=c include/fluent-bit/flb_config.h -B2 -A2
Length of output: 349
🏁 Script executed:
#!/bin/bash
# Check for other volatile flags used for thread coordination in the codebase
rg -n 'volatile.*int' include/fluent-bit/*.h --type=c | head -20
Length of output: 200
@jinyongchoi, thank you for the clarification! You're absolutely right—is_shutting_down in flb_config.h is indeed a plain int (line 64), and your use of volatile int for consistency with the existing thread coordination patterns in this codebase makes perfect sense. The single-direction write with periodic polling is a reasonable pattern for this use case, and maintaining consistency is important. Your implementation is fine as-is. 👍
🧬 Code Graph Analysis Results
include/fluent-bit/flb_input_thread.h
[Snippet 1 - Summary]
- File: include/fluent-bit/flb_input_thread.h
- Topic: Pause/shutdown synchronization flag
- Summary: The struct flb_input_thread_instance includes a volatile int named is_paused, described as a pause state flag used during shutdown synchronization. It is set to 1 when the thread completes pause processing and is checked by the main thread to ensure safe shutdown. This aligns with the user’s note about using volatile integers for coordination flags (e.g., is_shutting_down). No code snippet is provided here to avoid duplicating the exact implementation, but the field’s existence and intended concurrency behavior are documented for reviewer context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cosmo0920 Hello,
Could you please share your preference on this? I'm not certain about C11 _Atomic support across all platforms we target (particularly Windows/MSVC).
If there's platform-abstracted atomic support or a preference for _Atomic with fallbacks, I'm happy to update it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request addresses a critical data loss issue during Fluent Bit shutdown where threaded input plugins with data pending in ring buffers could exit before flushing that data to chunks.
Key Changes:
- Added ring buffer monitoring during shutdown to track pending data
- Implemented pause acknowledgement tracking for threaded inputs
- Modified pause checks to allow data flushing during shutdown
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/flb_ring_buffer.c |
Adds flb_ring_buffer_get_used() function to query pending data size in ring buffer |
include/fluent-bit/flb_ring_buffer.h |
Declares new flb_ring_buffer_get_used() function |
include/fluent-bit/flb_input_thread.h |
Adds is_paused volatile flag to track thread pause state |
src/flb_input_thread.c |
Sets/clears is_paused flag on pause/resume operations |
src/flb_input_chunk.c |
Bypasses pause checks during shutdown in collector and append functions; adds helper to calculate total ring buffer size |
include/fluent-bit/flb_input_chunk.h |
Declares flb_input_chunk_total_ring_buffers_size() helper function |
src/flb_engine.c |
Enhances shutdown logic to wait for threaded input pause acknowledgement and drain ring buffers before final exit |
Comments suppressed due to low confidence (1)
src/flb_input_chunk.c:3090
- At line 3086, the code attempts to read from
ins->rbwithout checking if the ring buffer exists. While the previous check at line 3081 callsflb_input_buf_paused(ins)which might implicitly verify the ring buffer, there's no explicit null check forins->rbbefore callingflb_ring_buffer_read. This could cause a null pointer dereference if an input instance doesn't have a ring buffer. Add a check:if (!ins->rb) continue;before the while loop starts.
ret = flb_ring_buffer_read(ins->rb,
(void *) &cr,
sizeof(cr));
if (ret != 0) {
break;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/flb_input_chunk.c
Outdated
| * During normal operation, respect the pause state to maintain | ||
| * backpressure. However, during shutdown (is_shutting_down=TRUE), | ||
| * we must flush all remaining data in the ring buffer to prevent | ||
| * data loss, so we skip the pause check. |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment at lines 3074-3079 provides good context, but it could be more precise about the behavior. Consider clarifying that skipping the pause check during shutdown allows the ring buffer to be drained even when backpressure would normally prevent it. This is critical for preventing data loss during graceful shutdown.
| * During normal operation, respect the pause state to maintain | |
| * backpressure. However, during shutdown (is_shutting_down=TRUE), | |
| * we must flush all remaining data in the ring buffer to prevent | |
| * data loss, so we skip the pause check. | |
| * During normal operation we respect the pause state to maintain | |
| * backpressure: if the input is paused we stop consuming from | |
| * the ring buffer. | |
| * | |
| * During shutdown (is_shutting_down == FLB_TRUE) we intentionally | |
| * skip this pause check so the ring buffer can be fully drained, | |
| * even when backpressure would normally prevent further reads. | |
| * This is critical to flush all enqueued records and avoid data | |
| * loss during graceful shutdown. |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6f4b64b0e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
c6f4b64 to
9fbf309
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add volatile is_paused flag to track when a threaded input has completed its pause operation. This allows the engine to verify all threaded inputs are fully paused before proceeding with shutdown. Signed-off-by: jinyong.choi <[email protected]>
During shutdown (is_shutting_down=TRUE), bypass the pause check to allow flushing remaining ring buffer data. Also add helper function flb_input_chunk_total_ring_buffers_size() to calculate total pending data across all threaded inputs, and improve cleanup logging. Signed-off-by: jinyong.choi <[email protected]>
During grace period, ensure all threaded inputs have acknowledged pause before final shutdown. Also drain pending ring buffer data to prevent data loss. This fixes a race condition where the engine could exit while threaded inputs still had buffered data. Signed-off-by: jinyong.choi <[email protected]>
9fbf309 to
3482153
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
While testing this PR, I found that I've added documentation for these parameters to the Tail input plugin: fluent/fluent-bit-docs#2320 |
|
@edsiper @cosmo0920 |
Problem
When Fluent Bit shuts down with grace period, threaded input plugins may still have data pending in their ring buffers. The engine could exit before this data was flushed to chunks, causing data loss.
Solution
ring_buffer: Add flb_ring_buffer_get_used() to check pending data size
input_thread: Add is_paused flag to track pause acknowledgement from threaded inputs
input_chunk: Bypass pause check during shutdown to allow ring buffer flush, add flb_input_chunk_total_ring_buffers_size() helper
engine: During grace period, wait for all threaded inputs to acknowledge pause and drain ring buffers before final shutdown
Fixes #11338
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
The "ring buffer pending" log is rarely seen because:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes
✏️ Tip: You can customize this high-level summary in your review settings.