Categories: BlogCanonicalUbuntu

A look forward to storage in 2022

It’s that time of year, where we start to look ahead, and think about the ongoing trends in our various industries. One thing is for certain in the storage industry: capacity demand remains high, with the industry observing continued exponential growth.

Growth, growth, growth

More and more data is being created every day. It truly is non-stop. In 2021 alone, it was predicted that enterprise storage vendors would ship almost 150 Exabytes in capacity, and this number is only expected to increase again in 2022!

Sponsored

We now see 20TB hard drives on the market to help with these needs, but we have to remain vigilant when building storage clusters, as the access speed of these drives hasn’t really changed at all over the last few years. In failure scenarios, where we have to recreate replicas or erasure-coded shards of data, it can take many many hours with drives of such high capacity.

So the rule of thumb remains the same: a larger number of smaller drives leads to a more predictable system for any amount of capacity. Of course, you do have to remain pragmatic to balance capacity needs with the cost of increasing the number of spindles.

Flash, denser, and faster

Over the last few years, we have seen huge leaps forward in capacity orientated flash. Intel recently launched a 30TB QLC 3D NAND drive, surpassing even the largest of traditional spinning drives. Whilst we wouldn’t suggest using these for very write-heavy workloads, there is definitely a place for them in storage systems to increase throughput above traditional spindle based configurations. Additionally, there are power usage benefits too, which in large-scale clusters becomes more and more important as you scale – and even at the Edge, where power budgets might be quite limited!

Computational storage

An interesting and novel area in hard drive technology is the concept of computational storage, that is, adding more intelligence to the hard drives and SSDs that we use in servers and storage clusters.

We have seen work in this area before, but the use case was almost too narrow. Seagate created a hard drive called Kinetic, which exposed a key/value object storage interface over Ethernet, rather than the usual block interfaces of SAS or SATA. This was interesting for those of us building larger scale object stores. It meant that, with each hard drive added to a cluster, an additional amount of compute resource was added too, leading to a highly scalable sea-of-compute-and-storage. Furthermore, it reduced failure domains significantly to a single disk, rather than a whole server containing multiple disks. However, this concept didn’t really gain much traction, as it required significant changes to the software used to build storage clusters. There just wasn’t enough resources on each drive to run an entire OSD in the case of Ceph.

Fast forward to 2021, and we see some smaller companies start to offer products that maintain typical SAS and SATA interfaces, but also provide capacity efficiency options like compression, or encryption, on-drive, without the requirement of any host processing power, or changes to the software running on the server.

Sponsored

This is a lot like what we have seen already in the Ethernet space, where certain tasks are offloaded to Smart-NICs. With some computationally aware storage devices, it is already possible to access the compute resources on these drives and use them for pre-processing datasets. When you may have a storage system with thousands of drives, this becomes a huge amount of additional computing power at your disposal.

Data repatriation – post pandemic splurge

Over the last two years, we have all seen huge changes in the way that we work. To support that, many companies have turned to public clouds to help them scale their operations immediately and maintain business as usual. Cost optimisation has largely been a secondary consideration.

However, as companies have settled into these new ways of operating, we now see a renewed focus on cost optimization and efficiency. Storage remains the least cloud-friendly piece of infrastructure, as usage is typically static or expanding, and doesn’t have peaks and troughs like compute might.

More and more companies are waking up to the costs of storing data in the cloud, and are considering near-cloud solutions where they operate their own hardware in co-location facilities adjacent to major cloud provider facilities, and link them together with private interconnects. Not only does this reduce costs immediately, it also means that there are no penalties when migrating to other cloud providers in the future too!

Wrap up

We wish you all Happy Holidays and a wonderful New Year!

Open source storage solutions such as Ceph can readily help solve for the growth and scaling challenges seen across the industry. Learn more about deploying Ceph from our recent webinar here.

Ubuntu Server Admin

Recent Posts

Cut data center energy costs with bare metal automation

Data centers are popping up everywhere. With the rapid growth of AI, cloud services, streaming…

21 hours ago

Build the future of *craft: announcing Starcraft Bounties!

Our commitment to building a thriving open source community is stronger than ever. We believe…

21 hours ago

NodeJS 18 LTS EOL extended from April 2025 to May 2032 on Ubuntu

The clock was ticking: Node.js 18’s upstream End of Life (EOL) The OpenJS Foundation is…

21 hours ago

Native integration now available for Pure Storage and Canonical LXD

June 25th, 2025 – Canonical, the company behind Ubuntu, and Pure Storage, the IT pioneer…

2 days ago

Revolutionizing Web Page Creation: How Structured Content is Slashing Design and Development Time

Co-authored with Julie Muzina A year ago, during our Madrid Engineering Sprint, we challenged ourselves…

3 days ago

Ubuntu Weekly Newsletter Issue 897

Welcome to the Ubuntu Weekly Newsletter, Issue 897 for the week of June 15 –…

4 days ago