a lot of sources, transforms #22084

Rentu · 2024-12-27T07:04:06Z

Rentu
Dec 27, 2024

I have a lot of log collection, and the simplest way to manage them is to use the filename as a source. However, this may lead to too many sources and impact performance, so we do some merging. But in the transform stage, different logs may have different conditions for multi-line matching or different rules for timestamp extraction. This could lead to a large number of transforms, possibly over a thousand. What are some good solutions for such a scenario? Our daily log collection volume is more than 30TB, and the logging scenarios are complex, with inconsistent encoding and a large number of log collections, making the cleansing rules complex.

pront · 2025-01-02T17:46:43Z

pront
Jan 2, 2025
Maintainer

But in the transform stage, different logs may have different conditions for multi-line matching or different rules for timestamp extraction.

Hmm, this is a challenging task and I am sure I am lacking context based on the description. I am thinking if standard consolidation/grouping practices help here, e.g. normalize logs early in the process and introduce some patterns to check against.

If you add more details and examples maybe we can come up with better recommendations.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a lot of sources, transforms #22084

{{title}}

Replies: 1 comment

{{title}}

Select a reply

a lot of sources, transforms #22084

Rentu Dec 27, 2024

Replies: 1 comment

pront Jan 2, 2025 Maintainer

Rentu
Dec 27, 2024

pront
Jan 2, 2025
Maintainer