The role of secure data storage in fueling AI innovation

Table of Contents

There is no AI without data

Artificial intelligence is the most exciting technology revolution of recent years. Nvidia, Intel, AMD and others continue to produce faster and faster GPU’s enabling larger models, and higher throughput in decision making processes.

Outside of the immediate AI-hype, one area still remains somewhat overlooked: AI needs data (find out more here). First and foremost, storage systems need to provide high performance access to ever growing datasets, but more importantly they need to ensure that this data is securely stored, not just for the present, but also for the future.

There are multiple different types of data used in typical AI systems:

Raw and pre-processed data
Training data
Models
Results

All of this data takes time and computational effort to collect, process and output, and as such need to be protected. In some cases, like telemetry data from a self-driving car, this data might never be able to be reproduced. Even after training data is used to create a model, its value is not diminished; improvements to models require consistent training data sets so that any adjustments can be fairly benchmarked.

Raw, pre-processed, training and results data sets can contain personally identifiable information and as such steps need to be taken to ensure that it is stored in a secure fashion. And more than just the moral responsibility of safely storing data, there can be significant penalties associated with data breaches.

Challenges with securely storing AI data

We covered many of the risks associated with securely storing data in this blog post. The same risks apply in an AI setting as well. Afterall machine learning is another application that consumes storage resources, albeit sometimes at a much larger scale.

AI use cases are relatively new, however the majority of modern storage systems, including the open source solutions like Ceph, have mature features that can be used to mitigate these risks.

Physical theft thwarted by data at rest encryption

Any disk used in a storage system could theoretically be lost due to theft, or when returned for warranty replacement after a failure event. By using at rest encryption, every byte of data stored on a disk, spinning media, or flash, is useless without the cryptographic keys needed to unencrypt the data. Thus protecting sensitive data, or proprietary models created after hours or even days of processing.

Strict access control to keep out uninvited guests

A key tenet of any system design is ensuring that users (real people, or headless accounts) have access only to the resources they need, and that at any time that access can easily be removed. Storage systems like Ceph use both their own access control mechanisms and also integrate with centralised auth systems like LDAP to allow easy access control.

Eavesdropping defeated by in flight encryption

There is nothing worse than someone listening into a conversation that they should not be privy to. The same thing can happen in computer networks too. By employing encryption on all network flows: client to storage, and internal storage system networks no data can be leaked to 3rd parties eavesdropping on the network.

Recover from ransomware with snapshots and versioning

It seems like every week another large enterprise has to disclose a ransomware event, where an unauthorised 3rd party has taken control of their systems and encrypted the data. Not only does this lead to downtime but also the possibility of having to pay a ransom for the decryption key to regain control of their systems and access to their data. AI projects often represent a significant investment of both time and resources, so having an initiative undermined by a ransomware attack could be highly damaging.

Using point in time snapshots or versioning of objects can allow an organisation to revert to a previous non-encrypted state, and potentially resume operations sooner.

Learn more

Ceph is one storage solution that can be used to store various AI datasets, and is not only scalable to meet performance and capacity requirements, but also has a number of features to ensure data is stored securely.

Find out more about how Ceph solves AI storage challenges:

Find out more about Ceph here.

Additional resources

How to deploy AI workloads at the edge using open source solutions

Running AI workloads at the edge with Canonical and Lenovo AI is driving a new wave of opportunities in all kinds of edge settings—from predictive maintenance in manufacturing, to virtual assistants in healthcare, to telco router optimisation in the most remote locations. But to support these AI workloads running virtually…

October 1, 2024

In "Blog"

Join the Canonical Data and AI team at Data Innovation Summit 2024

March 14, 2024

In "Blog"

Accelerating AI with open source machine learning infrastructure

March 21, 2025

In "Blog"

Ubuntu Server Admin

Next Ventana and Canonical collaborate on enabling enterprise data center, high-performance and AI computing on RISC-V »

Previous « Canonical announces collaboration with Qualcomm

The role of secure data storage in fueling AI innovation

There is no AI without data

Challenges with securely storing AI data

Physical theft thwarted by data at rest encryption

Strict access control to keep out uninvited guests

Eavesdropping defeated by in flight encryption

Recover from ransomware with snapshots and versioning

Learn more

Additional resources

Related

How to deploy AI workloads at the edge using open source solutions

Join the Canonical Data and AI team at Data Innovation Summit 2024

Accelerating AI with open source machine learning infrastructure

Recent Posts

What is RDMA over Converged Ethernet (RoCE)?

🚀 How to Deploy Cosmos Cloud on Ubuntu VPS

Beyond tokens per watt – using Ubuntu 26.04 LTS for AI

Small PRs, big speedups: The Ruby performance work you almost missed

A look into Ubuntu Core 26: Deploying AI models on Renesas RZ/V series for production

RISC-V profiles – why is RVA23 significant?

The role of secure data storage in fueling AI innovation

There is no AI without data

Challenges with securely storing AI data

Physical theft thwarted by data at rest encryption

Strict access control to keep out uninvited guests

Eavesdropping defeated by in flight encryption

Recover from ransomware with snapshots and versioning

Learn more

Additional resources

Related

Related Post

Recent Posts

This Website Uses Cookies