-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Engine] Evaluation of ZSTD (Z-standard) compression algorithm for log data #19
Comments
Notice, Redis is not allowed as a dependency in the ASF, due to license. |
I checked Redis-core itself is BSD3, we do not use any extensions/modules that have any code with their RSAL license. Would that still be a problem? I'm a bit confused about these things and hope to learn more. Also, in skywalking-python, we have a In the future, we could switch to ship with kvrocks, but it unfortunately doesn't fully support stream consumer group commands yet (that we heavily rely on). |
Are you only using Redis core? Many modules would be AGPL, even common clause. I didn't check the features you are going to use, so, this is a reminder. Also, you mentioned it works as a buffer, that is usually queue server role, why do you choose redis queue? |
OK, like I said, for now, even for an AGPL module, it is fine. Until you want to move this into the ASF. |
Understood, Thank you! |
TODO: implement a self-optimizer by monitoring the metric of compression ratio, if that degrades significantly, we retrain the dictionary and propagate it to each consumer to improve compression performance. |
AIOps engine will receive a large amount of log data from SkyWalking, and we decided to utilize a Redis stream as the buffer before stream processing. One noticeable issue is that standard Zlib cannot compress logs well as they arrive one by one (not enough knowledge to compress), according to how-compression-algorithm-works; therefore, costing extra memory/disk.
Note we delete logs immediately from the stream after processing, but still, it's worth to compressing logs for the sake of network bandwidth and prevent overloading Redis.
So here comes ZSTD , which can facilitate our flow by two directions (I): simply replacing Zlib with ZSTD to achieve a 2x average compression speed. (II**): to utilize dictionary compressor, that is, learning from a small sample batch of logs and then using the knowledge to further boost compression, this could save extra memory/disk. (Todo evaluate how to execute the learning phase - do we learn one for each service? do we learn one unified model or periodically retrain? etc.)
Some public discussions that prove its feasibility:
https://groups.google.com/g/redis-db/c/slk-c33EZ7U/m/tx81gCMDDQAJ - adoption case
http://facebook.github.io/zstd/ - performance comparison
https://github.com/animalize/pyzstd - target python lib for implementation
=======================================
Initial Experimentation Results and suggestions are welcome:
The results below show ZSTD with dictionary training on a very small amount (first 1k, increasing to 5k doesn't help) log data from the same service would save 33% more memory/disk in storage for the remaining 500k data.
(further experiments are needed to see if generally applicable)
The additional idea is that if we compose a good dataset that represents "what a normal log would look like", then it can be used as universal training data, compression ratio could be further pushed.
Note: My docker Redis bandwidth is slow.
ZLIB
size of log in Megabyte 86.237173MB
Time taken to send 500k messages with batch 2000: 12.09048318862915 seconds
92MB used in actual Redis key
ZSTD with dict training
done training dict on first 1000 log samples
func:train_zstd took: 0.06115330 sec
size of log in Mega byte 54.717921MB
Time taken to send 500k messages with batch 2000: 8.131911993026733 seconds
58MB used in actual Redis key
ZSTD with basic compressor [default level]
size of log in Megabyte 88.285950MB
Time taken to send 500k messages with batch 2000: 9.860241889953613 seconds
ZSTD with rich memory compressor [default level] (a bit decreased compression ratio)
size of log in Megabyte 88.386098MB
Time taken to send 500k messages with batch 2000: 9.413931131362915 seconds
The text was updated successfully, but these errors were encountered: