Reading the uncompressed GZIP file size in Ruby without decompression 2

Reading the uncompressed GZIP file size in Ruby without decompression

There are cases where you have a compressed GZIP file for which you want to determine the uncompressed data size without having to extract it.

For example, if you work with large text-based documents, you can either display their content directly in the browser or share it as a file upon request depending on the file size.

Luckily for us, the GZIP file format specification includes the following statement:

         +=======================+
         |...compressed blocks...| (more-->)
         +=======================+

           0   1   2   3   4   5   6   7
         +---+---+---+---+---+---+---+---+
         |     CRC32     |     ISIZE     |
         +---+---+---+---+---+---+---+---+

         ISIZE (Input SIZE)
            This contains the size of the original (uncompressed) input
            data modulo 2^32.

It means that as long as the uncompressed payload is less than 4GB, the ISIZE value will represent the uncompressed data size.

You can get it in Ruby by combining #seek, #read and #unpack1 as followed:

# Open file for reading
file = File.open('data.gzip')
# Move to the end to obtain only the ISIZE data
file.seek(-4, 2)
# Read the needed data and decode it to unsigned int
size = file.read(4).unpack1('I')
# Close the file after reading
file.close
# Print the result
puts "Your uncompressed data size: #{size} bytes"

Cover photo by Daniel Go on Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0). Image has been cropped to 766x450px.

The post Reading the uncompressed GZIP file size in Ruby without decompression appeared first on Closer to Code.

Leave a Comment

Your email address will not be published. Required fields are marked *