-
Notifications
You must be signed in to change notification settings - Fork 1.7k
in_tail: only rely on fstat() to detect file rotation #10280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Example configuration file
Debug logs file truncation
|
thanks for contributing this PR. This is a very sensitive change (actually this is one of the parts of the plugin I avoid to touch :D ), wondering how we can extend testing to avoid regressions, I remember there are a couple of corner cases. adding @leonardo-albertovich as extra eyes for this one. |
@@ -119,15 +119,21 @@ static int tail_fs_check(struct flb_input_instance *ins, | |||
continue; | |||
} | |||
|
|||
int64_t size_delta = st.st_size - file->size; | |||
if (size_delta != 0) { | |||
file->size = st.st_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place in this PR where the change is not restricted to the method scope and might have any impact outside the method. This file->size assignment was not here before (and it could be removed as we only need size_delta to detect truncation). I introduced this only for consistency with the other implementations.
I had the feeling file_tail might be one of those things that was implemented at first and everyone uses. The kind of thing you don't want to change if it's not broken. Reading logs from NFS might no be the most common use case, but as people move workloads to the cloud, shared remote storage will become more common. Plus this bug only surfaces with some particular configurations of NFS where file metadata is cached. The PR is actually the second approach to solve the issue (I had some fix deployed that worked, but it was too complex and overthought). The current change proposal is very scope limited and its impact can be easily grasped by reading the code changes. The only possible side effect of this change is that on NFS with metadata cache detecting truncation might be delayed until the metadata cache is updated. But this is in any case much better than having false truncation/rotation detections that lead to repeated log ingest. I've been running a fork with this fix in production for some days now without issues.
As per what this PR tries to solve I don't think there is a feasible way of testing it as it relies on the stream offset and fstat() providing non synchronized information which is something you cannot easily artificially produce. |
When reading logs from NFS the results obtained by calling fstat() might be outdated. This leads to looped ingest as the plugin detects truncation over and over again when the file has not been truncated because it compares the current stream offset with the file size reported in the metadata (which is stale).
This PR solves this by relying exclusively on the information provided by fstat() to decide wether or not to rotate, where before it used offset and fstat()->file_size (offset could be larger the the file_size reported by fstat as fstat comes from cache and takes some time to update).
Fixes #10276
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.