Skip to content

Commit d1ddc43

Browse files
committed
in_tail: Add a description and note for Unicode.Encoding parameter
Signed-off-by: Hiroshi Hatake <[email protected]>
1 parent 950013e commit d1ddc43

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

pipeline/inputs/tail.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,23 @@ The plugin supports the following configuration parameters:
3737
| `Static_Batch_Size` | Set the maximum number of bytes to process per iteration for the monitored static files (files that already exist upon Fluent Bit start). | `50M` |
3838
| `File_Cache_Advise` | Set the `posix_fadvise` in `POSIX_FADV_DONTNEED` mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux. | `On` |
3939
| `Threaded` | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). | `false` |
40+
| `Unicode.Encoding` | Set the encoding which the origin of character encoding. Currently, UTF-16LE, UTF-16BE, and auto is supported. | _none_ |
41+
42+
{% hint style="info" %} If the database parameter `DB` isn't
43+
specified, by default the plugin reads each target file from the
44+
beginning. This might cause unwanted behavior. For example, when a
45+
line is bigger than `Buffer_Chunk_Size` and `Skip_Long_Lines` isn't
46+
turned on, the file will be read from the beginning of each
47+
`Refresh_Interval` until the file is rotated. {% endhint %}
4048

4149
{% hint style="info" %}
42-
If the database parameter `DB` isn't specified, by default the plugin reads each target file from the beginning. This might cause unwanted behavior. For example, when a line is bigger than `Buffer_Chunk_Size` and `Skip_Long_Lines` isn't turned on, the file will be read from the beginning of each `Refresh_Interval` until the file is rotated.
50+
Note that `Unicode.Encoding` depends on simdutf library which is written in C++11 or above.
51+
So, the older platforms are not supported for this feature.
52+
In addition, `Unicode.Encoding auto` is not covered for the all of the usages.
53+
This is because sometimes this auto-detecting for character encodings makes a mistake to guess the correct encoding.
54+
We recommend to use `UTF-16LE` or `UTF-16BE` if the target file encoding is pre-determined or known beforehand.
55+
In details, this parameter requests to use 2-bytes aligned chunk and buffer sizes.
56+
If they are not aligned for 2 bytes, Fluent Bit will use 2-bytes alignments automatically to avoid character breakages on consuming boundaries.
4357
{% endhint %}
4458

4559
## Monitor a large number of files

0 commit comments

Comments
 (0)