This project aims to compare data formats in size and compression time
Generates N number of Library data structure with N number of Book entries, then serializes it in three formats:
-
Avro (
AvroLibraryCompressionTest)- We write a single Avro record (with nested "Book" array, "Author") to different files:
- No compression
- Snappy
- Deflate
- Compare file sizes and time taken.
- We write a single Avro record (with nested "Book" array, "Author") to different files:
-
JSON (
JsonLibraryCompressionTest)- We serialize the same structure to JSON (using Jackson).
- Then write:
- no-compression
.json - gzip-compressed
.json.gz - snappy-compressed
.json.snappy
- no-compression
- Compare sizes and performance.
-
Protobuf (
ProtobufLibraryCompressionTest)- Using the same concept (library, books, author).
- We generate a
Librarymessage, serialize to bytes, then store:- raw
.bin - gzip
- snappy
- raw
- Compare file sizes and serialization speed.
- Java 11+
- Maven 3.6+ (or higher)
You might want to compile your LibraryOuterClass using protoc
In windows:
protoc.exe --java_out=. .\library.protoIn linux however it's simply as:
protoc --java_out=. library.protoThen compile and run:
mvn clean compile
# windows
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.JsonLibraryCompressionTest"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.AvroLibraryCompressionTest"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.ProtobufLibraryCompressionTest"
#linux
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.AvroLibraryCompressionTest"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.JsonLibraryCompressionTest"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.ProtobufLibraryCompressionTest"
#Line by line comparison:
#win
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.JsonLibraryCompressionTestLineByLine"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.AvroLibraryCompressionTestLineByLine"
mvn exec:java -D"exec.mainClass=com.github.paf91.compressiontest.ProtobufLibraryCompressionTestLineByLine"
#linux
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.AvroLibraryCompressionTestLineByLine"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.JsonLibraryCompressionTestLineByLine"
mvn exec:java -Dexec.mainClass="com.github.paf91.compressiontest.ProtobufLibraryCompressionTestLineByLine"