Reading the uncompressed GZIP file size in Ruby without decompression

There are cases where you have a compressed GZIP file for which you want to determine the uncompressed data size without having to extract it.

For example, if you work with large text-based documents, you can either display their content directly in the browser or share it as a file upon request depending on the file size.

Luckily for us, the GZIP file format specification includes the following statement:

         +=======================+
         |...compressed blocks...| (more-->)
         +=======================+

           0   1   2   3   4   5   6   7
         +---+---+---+---+---+---+---+---+
         |     CRC32     |     ISIZE     |
         +---+---+---+---+---+---+---+---+

         ISIZE (Input SIZE)
            This contains the size of the original (uncompressed) input
            data modulo 2^32.

It means that as long as the uncompressed payload is less than 4GB, the ISIZE value will represent the uncompressed data size.

You can get it in Ruby by combining #seek, #read and #unpack1 as followed:

# Open file for reading
file = File.open('data.gzip')
# Move to the end to obtain only the ISIZE data
file.seek(-4, 2)
# Read the needed data and decode it to unsigned int
size = file.read(4).unpack1('I')
# Close the file after reading
file.close
# Print the result
puts "Your uncompressed data size: #{size} bytes"

Cover photo by Daniel Go on Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0). Image has been cropped to 766x450px.

The post Reading the uncompressed GZIP file size in Ruby without decompression appeared first on Closer to Code.

Ubuntu Server Admin

Recent Posts

🚀 Deploy Elastic Stack on Ubuntu VPS (5 Minute Quick-Start Guide)

Here’s the guide to deploy Elastic Stack on Ubuntu VPS, with secure access, HTTPS proxying,…

2 days ago

🚀 Deploy Nagios on Ubuntu VPS

This guide walks through deploying Nagios Core on an Ubuntu VPS, from system prep to…

2 days ago

Shoryuken Has a New Maintainer, and v7.0.0 Is Almost There

After a decade under Pablo Cantero's stewardship, Shoryuken has a new maintainer - me. I'm…

6 days ago

A better way to provision NVIDIA BlueField DPUs at scale with MAAS

MAAS 3.7 has been officially released and it includes a bunch of cool new features.…

2 weeks ago

Ruby Floats: When 2.6x Faster Is Actually Slower (and Then Faster Again)

Update: This article originally concluded that Eisel-Lemire wasn't worth it for Ruby. I was wrong.…

2 weeks ago

MicroCeph: why it’s the superior MinIO alternative (and how to use it)

Recently, the team at MinIO moved the open source project into maintenance mode and will…

2 weeks ago