Skip to content

Commit 331b498

Browse files
committed
in_tail: Add generic.encoding parameter descriptions
Also I added the reason why we need to support these parameters and how to use them. Signed-off-by: Hiroshi Hatake <[email protected]>
1 parent ee6e4ff commit 331b498

File tree

1 file changed

+84
-0
lines changed

1 file changed

+84
-0
lines changed

pipeline/inputs/tail.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ The plugin supports the following configuration parameters:
3838
| `File_Cache_Advise` | Set the `posix_fadvise` in `POSIX_FADV_DONTNEED` mode. This reduces the usage of the kernel file cache. This option is ignored if not running on Linux. | `On` |
3939
| `Threaded` | Indicates whether to run this input in its own [thread](../../administration/multithreading.md#inputs). | `false` |
4040
| `Unicode.Encoding` | Set the encoding which the origin of character encoding. Currently, UTF-16LE, UTF-16BE, and auto is supported. | _none_ |
41+
| `Generic.Encoding` | Set the encoding which the origin of character encoding. Currently, ShiftJIS, UHC, GBK, GB18030, Big5, Win866, Win874, Win1250, Win1251, Win1252, Win2513, Win1254, Win1255, Win1256 are supported. | _none_ |
4142

4243
{% hint style="info" %}
4344
If the database parameter `DB` isn't specified, by default the plugin reads each target file from the beginning. This might cause unwanted behavior. For example, when a line is bigger than `Buffer_Chunk_Size` and `Skip_Long_Lines` isn't turned on, the file will be read from the beginning of each `Refresh_Interval` until the file is rotated.
@@ -433,3 +434,86 @@ While file rotation is handled, there are risks of potential log loss when using
433434
- Final note: the `Path` patterns can't match the rotated files. Otherwise, the rotated file would be read again and lead to duplicate records.
434435

435436
{% endhint %}
437+
438+
## Character Encoding Conversion
439+
440+
This feature allows Fluent Bit to convert logs from various character encodings into the standard UTF-8 format.
441+
This is crucial for processing logs from systems, especially Windows, that use legacy or non-UTF-8 encodings.
442+
Proper conversion ensures that your log data is correctly parsed, indexed, and searchable.
443+
444+
### When to Use This Feature
445+
446+
You should use this feature if your log files or messages are not in UTF-8 and you are seeing garbled or incorrectly rendered characters.
447+
This is common in environments that use:
448+
449+
* Modern Windows applications that log in UTF-16.
450+
451+
* Legacy Windows systems with applications that use traditional code pages (e.g., ShiftJIS, GBK, Win1252).
452+
453+
### Configuration Parameters
454+
455+
To enable encoding conversion, you will use one of the following two parameters within an input plugin configuration.
456+
457+
1. `Unicode.Encoding`
458+
459+
Use this parameter for high-performance conversion of UTF-16 encoded logs to UTF-8. This method utilizes modern processor features (SIMD instructions) to accelerate the conversion process, making it highly efficient.
460+
461+
* Use Case: Ideal for logs coming from modern Windows environments that default to UTF-16.
462+
* Supported Values:
463+
* UTF-16LE (Little-Endian)
464+
* UTF-16BE (Big-Endian)
465+
466+
2. `Generic.Encoding`
467+
468+
Use this parameter to convert from a wide variety of other character encodings, particularly legacy Windows code pages.
469+
470+
* Use Case: Essential for logs from older systems or applications configured for specific regions, common in East Asia and Eastern Europe.
471+
* Supported Values: You can use any of the names or aliases listed below.
472+
473+
### East Asian Encodings
474+
* `ShiftJIS` (Aliases: `SJIS`, `CP932`, `Windows-31J`)
475+
* `GB18030`
476+
* `GBK`: (Alias: `CP936`)
477+
* `UHC` (Unified Hangul Code): (Aliases: `CP949` and `Windows-949`)
478+
* `Big5`: (Alias: `CP950`)
479+
480+
### Windows (ANSI) Encodings
481+
* `Win1250` (Central European): (Alias: `CP1250`)
482+
* `Win1251` (Cyrillic): (Alias: `CP1251`)
483+
* `Win1252` (Western European / Latin): (Alias: `CP1252`)
484+
* `Win1253` (Greek): (Alias: `CP1253`)
485+
* `Win1254` (Turkish): (Alias: `CP1254`)
486+
* `Win1255` (Hebrew): (Alias: `CP1255`)
487+
* `Win1256` (Arabic): (Alias: `CP1256`)
488+
489+
### DOS (OEM) Encodings
490+
* `Win866` (Cyrillic - DOS): (Alias: `CP866`)
491+
* `Win874` (Thai): (Alias: `CP874`)
492+
493+
### Configuration Example
494+
495+
Here is an example of how to use `Generic.Encoding` with the Tail input plugin to read a log file encoded in ShiftJIS.
496+
497+
{% tabs %}
498+
{% tab title="fluent-bit.yaml" %}
499+
500+
```yaml
501+
pipeline:
502+
inputs:
503+
- name: tail
504+
path: /var/log/containers/*.log
505+
generic.encoding: ShiftJIS
506+
```
507+
508+
{% endtab %}
509+
{% tab title="fluent-bit.conf" %}
510+
511+
```text
512+
[INPUT]
513+
Name tail
514+
Path C:\path\to\your\sjis.log
515+
Generic.Encoding ShiftJIS
516+
```
517+
518+
{% endtab %}
519+
{% endtabs %}

0 commit comments

Comments
 (0)