How to use java to do compress and uncompress using snappy or bzip2
Introduction
This post would demo how to do compress and uncompress using snappy or bzip2.
Environments
Java 1.8
1. The Snappy method
Snappy is a high-performance compress library , as follows:
Snappy is a compression/decompression library. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. For instance, compared to the fastest mode of zlib, Snappy is an order of magnitude faster for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger. On a single core of a Core i7 processor in 64-bit mode, Snappy compresses at about 250 MB/sec or more and decompresses at about 500 MB/sec or more.
1.1 the pom.xml
1.2 the example code
The key points are as follows:
Prepare a big file to compress, I use python to generate a random big file, you can view this article to prepare a file
The compression code just compress the bytes and check the time elapsed
The uncompression code use the compressed bytes to uncompress and check the size and time
We got this result:
1.3 the snappy summary
As we can see that the snappy is very fast but it has a low compression rate. For my randomized big file, the compressed size is even bigger than the origin one!
2. the bzip2 method
The bzip2 is a very high-rate compression library, as follows:
bzip2 is a freely available, patent free (see below), high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression.
Apache foundation provides a compress library which contains bzip2 library, here is the example.
2.1 the pom.xml
2.2 the example code
2.3 the result
And run the code, we got this
2.4 bzip2 summary
As we can see, the bzip2 is slower than the snappy , but it has a high compression rate, the compressed file is 75% of the original size. It’s awesome!
summary
I recommend to use snappy when the performance is a key feature, but if you care about the compression rate, choose bzip2.
You can find detail documents about the snappy and bzip2 here: