Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File source should expand support for other file compression algorithms (zip, bzip, lzma, zstd etc.) #13500

Closed
neuronull opened this issue Jul 11, 2022 · 7 comments
Labels
source: file Anything `file` source related type: feature A value-adding code addition that introduce new functionality.

Comments

@neuronull
Copy link
Contributor

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Use Cases

While less common than gzip, these compression algorithms are in the field and support for them would expand the scope of users leveraging the file source.

Attempted Solutions

No response

Proposal

No response

References

No response

Version

0.23

@neuronull neuronull added source: file Anything `file` source related type: feature A value-adding code addition that introduce new functionality. labels Jul 11, 2022
@hhromic
Copy link
Contributor

hhromic commented Jul 11, 2022

While this feature request is indeed very useful, please don't forget to take a look at this issue #13193 first :)
At the moment we are unable to use Vector to process gzipped files due to that issue. We think is quite an important bug :(

@arunslalgit
Copy link

Thanks for creating the issue ticket. Some of the windows servers with only native .zip compression is allowed (by organization) for log files will benefit from this (including me of course!)

@zamazan4ik
Copy link
Contributor

zamazan4ik commented Aug 16, 2022

Just a veeeeeeeeeeery small note about Zip. Zip is not a compression method - that is just a file format that internally supports different compression methods.

As a zip-rs (https://github.com/zip-rs/zip) partial maintainer, I would say it could be kinda difficult to add Zip support, since zip-rs library is almost unmaintained. Maybe VectorDev oraganization will be able to revive this library (I hope so).

For other algorithms I guess this library could be kinda interesting - https://github.com/Nemo157/async-compression/ (or integrate other algorithms directly with the corresponding libraries).

From my perspective, support for other compression algorithms is really important. Especially, I am interested in zstd support since it is really fast and is well-adopted across the industry (e.g. built-in BTRFS compression).

@zamazan4ik
Copy link
Contributor

Related issue: #2302

@zamazan4ik
Copy link
Contributor

@jszwedko Could you please clarify, in which exactly compression algorithms Vector is interested? I guess I can help here with the integration somehow (since I have some free time). Probably some of them are more important than others. Also, since the compression story seems to be quite big (a lot of compression algorithms, different sinks support different compression algorithms subset, Vector does not support yet setting compression level, etc) maybe would be a good decision to create an epic for the compression story and track there the progress.

@jszwedko
Copy link
Member

Thanks for the note @zamazan4ik ! I'm actually curious to hear from others using the file source in the wild, what compression algorithms they typically use. @neuronull dug back the original motivation for this issue and the user was asking for zip support.

As you note, compression support is spotty and source / sink specific. It does seem like we could take a more generalized approach here, similar to how codecs are implemented, to generalize the compression. It'd require some deep architectural work though, which is probably best started with an RFC document.

In absence of that, we are happy to see additional algorithms added to specific sources and sinks. It sounds like zip could be another good one for the file source. It would be nice to surface more demand for the other algorithms mentioned in the title.

@jszwedko
Copy link
Member

jszwedko commented Jun 30, 2023

Closing in-lieu of #16891

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
source: file Anything `file` source related type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

5 participants