How to reduce data storage costs by up to 50% with Ceph

Canonical Ceph with Intel Quick Assist Technology (QAT)

Photo by Mathieu Turle on Unsplash

In our last blog post we talked about how you can use Intel® QAT with Canonical Ceph, today we’ll cover why this technology is important from a business perspective – in other words, we’re talking data storage costs.

Retaining and protecting data has an inherent cost based on the underlying architecture of the system used to store it. In the public cloud this is very easy to understand, as each GB stored incurs a per unit fee, with additional costs based on how frequently the storage is accessed (read more about that in our blog post on cloud storage costs here).

On premise solutions are typically more complex to calculate a cost-per-gigabyte value as you also need to take into account the power, cooling, physical space, as well as the hardware including networking, and ongoing maintenance.  For simplicity’s sake, in this post we will examine a typical storage server configuration, focusing on the hardware components themselves, as environmental costs can vary widely across the globe, but would remain constant if either configuration was deployed.

The most important aspect is to understand the impact of a hardware decision and how it can affect the total cost of ownership (TCO), as we’ll explore below.

See also  How to Install Minecraft on Linux [Easy Steps]

Diving in: hardware comparisons

Using a well known vendor’s online configuration tool we can examine the cost differences between similar server configurations. As we are focusing on compression the only difference between the configurations will be the CPUs, with and without hardware offload (QAT).

However, it should be noted that the largest driver of cost in any storage system are the disks themselves, for example the list price of a 15.36TB NVMe drive is $9,396.63, which means that in a single cluster node just the cost of disks is over $225,000, with the remainder of the components costing approximately $20,000.

By swapping out the CPU in a server configuration we can explore how much additional cost would be introduced by adding QAT offload engines, in the examples below we have selected equivalent CPUs with and with and without QAT.  Both server configurations provide 368.64TB of raw storage space, and the per GB costs below assume a 3-Replica protection scheme.

Server ConfigurationNo QATQAT enabled
Processor2x Xeon 6448Y2x Xeon 6548N
Memory256GB RAM256GB RAM
OSD disks24x 15.36TB NVMe24x 15.36TB NVMe
NetworkingDual 100GbEDual 100GbE
Boot disks2x 1TB SSD 2x 1TB SSD 
Total cost$242,472$243,032
Per GB cost (3-Replica)$1.89/GB$1.90/GB
Comparable server configurations with and without hardware offload.

What savings does compression bring to data storage costs?

Based on the undiscounted list prices we can see that adding QAT does indeed increase the cost of a Ceph storage server by $560, or an increase of 0.25%, but this can easily be justified when considering a dataset that can be compressed.  As can be seen below, even data that is already stored in a compressed format such as JPEG, can benefit from inline compression in a Ceph storage system, with other data types seeing greater compression and thus space savings.

See also  How to Edit Files From Your Linux VPS Terminal
Data SetTypeCompression RatioSpace Saving
MinIO Warp (400GB)Synthetic CSV1.3325%
Images (1.1GB)5000 Jpgs1.011%
COVID-19 Research Data (100GB)Json, CSV, Text1.3325%
Video (200GB)RAW YUV3.1368%
Video (110GB)H.2641.000%
Compression ratios experienced with different data types.

Using the server configuration that includes QAT offload engines, the effective cost to store a single GB of data is $1.90. As the compression ratio increases so do the savings as can be seen below:

Compression %01020304050
Capacity GB383,386425,984479,232547,694638,976766,771
Cost per GB$1.90$1.71$1.52$1.33$1.14$0.95
Decreasing cost per GB with greater compression levels
How to reduce data storage costs by up to 50% with ceph 4

Key takeaways

While initially, it is natural to think that if you add additional hardware, or change a processor model for another with additional features, there will be additional costs.  Which as we can see in these server examples is technically true, at first glance. However, in the scenario above changing the CPU model is just 0.25% of the total upfront cost, but once we start to take into account the savings provided by the hardware offload compression to actual TCO per GB stored can fall dramatically, and with some types of data, like raw video over 50% is possible!

Even so, for data sets that can be compressed the additional cost is easily mitigated, and for data that has the greatest compression ratios the cost savings become significant.

See also  How to use SAR Command in Linux

The greater the compression ratio, the less capacity required, which in turns reduces the number of storage servers required, and less network ports, which leads to a reduction in power consumption and facility costs too! These savings apply to all types of disks, including lower cost NL-SAS, as the compression is applied inline as the object storage damon (OSD) processes client IO.

Be sure to read our previous blog post if you want to try out Ceph with QAT.

Join our webinar

Find out more about how Ceph and QAT can be used to improve storage efficiency in our upcoming webinar, Maximize your storage efficiency with Ceph, on 12th February 2025, at 5PM CET, 11AM ET.

How to reduce data storage costs by up to 50% with ceph 5

Additional resources


Discover more from Ubuntu-Server.com

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

    Leave a Reply