The librdkafka Supply Chain Breakdown: rdkafka-ruby’s Darkest Hour

Opening Note

We all make mistakes, and fundamentally, the havoc caused by this incident was due to a flaw in the design of rdkafka-ruby. While the disappearance of librdkafka from GitHub was unexpected, this article aims to clarify and explain how rdkafka-ruby should have prevented it and what was poorly designed. By examining this incident, I hope to provide insights into better practices for managing dependencies and ensuring more resilient software builds for the Ruby ecosystem.

Incident Summary

On July 10, 2024 15:47 UTC, users of the rdkafka gem faced issues when the librdkafka repository on GitHub unexpectedly went private. This break in the supply chain disrupted installations, causing widespread frustration and, in many cases, completely blocking the ability to deploy rdkafka-based software.

Fetching rdkafka 0.16.0
Installing rdkafka 0.16.0 with native extensions
Gem::Ext::BuildError: ERROR: Failed to build gem native extension.

    current directory: /rdkafka-0.16.0/ext
/usr/local/bin/ruby -rrubygems
/rake-13.2.1/exe/rake
RUBYARCHDIR=/home/circleci/.rubygems/extensions/x86_64-linux/3.3.0/rdkafka-0.16.0
RUBYLIBDIR=/home/circleci/.rubygems/extensions/x86_64-linux/3.3.0/rdkafka-0.16.0
2 retrie(s) left for v2.4.0 (404 Not Found)
1 retrie(s) left for v2.4.0 (404 Not Found)
0 retrie(s) left for v2.4.0 (404 Not Found)
404 Not Found
rake aborted!
Errno::ENOENT: No such file or directory @ rb_sysopen - ports/archives/v2.4.0
(Errno::ENOENT)
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:496:in
`verify_file'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:133:in
`block in download'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:131:in
`each'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:131:in
`download'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:232:in
`cook'
/rdkafka-0.16.0/ext/Rakefile:38:in `block
in '
/rake-13.2.1/exe/rake:27:in `
' Tasks: TOP => default (See full trace by running task with --trace)

Detailed Explanation

The rdkafka gem used to rely on downloading librdkafka from the Confluent GitHub repository during the installation process. As a huge proponent of immutable builds that do not depend on external resources, I planned to change this model for a long time. Several months ago, I created a GitHub issue to address this transition. However, the change was delayed due to other priorities within the karafka ecosystem. Unfortunately, this delay resulted in the recent outage.

# Just the relevant code here

recipe.files << {
  :url => "https://codeload.github.com/edenhill/librdkafka/tar.gz/v#{Rdkafka::LIBRDKAFKA_VERSION}",
  :sha256 => Rdkafka::LIBRDKAFKA_SOURCE_SHA256
}

recipe.configure_options = ["--host=#{recipe.host}"]
recipe.cook

This setup meant that during the bundle install process, the required librdkafka source was fetched and compiled on the fly, which inherently relied on the availability of the external GitHub repository.

Upon discovery, it took me 59 minutes to release the first patched version and approximately four hours to prepare fixes and backport them to all relevant versions of the rdkafka gem, including older ones. Luckily, I was in front of my computer when the incident occurred, allowing me to quickly create and release needed fixes.

Future Steps

Going forward, all future releases will depend only on RubyGems, ensuring no reliance on external build sources like GitHub. I decided to ship the librdkafka releases inside the gem itself, enhancing its reliability and stability of the ecosystem.

releases = File.expand_path(File.join(File.dirname(__FILE__), '../dist'))

recipe.files << {
  :url => "file://#{releases}/librdkafka_#{Rdkafka::LIBRDKAFKA_VERSION}.tar.gz",
  :sha256 => Rdkafka::LIBRDKAFKA_SOURCE_SHA256
}
recipe.configure_options = ["--host=#{recipe.host}"]
recipe.cook

Fragility of the OSS Supply Chain

This incident highlights our dependence on other OSS projects and repositories. It’s essential to remember that mistakes can happen, and we must be prepared. This wasn’t the first issue with GitHub downloads. In 2023, a change in GitHub’s tar layout broke a lot of software, including ours, that relied on checksums for artifacts verification. To be honest, if we had migrated the building process of rdkafka at that time, this article would not have to be written.

Here are my main takeaways from this incident:

  1. Design Flaws Can Amplify Issues: The incident highlighted how design flaws in dependency management can lead to significant disruptions.
  2. Dependency on External Repositories: Relying on external data sources during the build process can pose risks, mainly when unexpected changes occur.
  3. Importance of Immutable Builds: Adopting immutable builds without external resources can enhance reliability and stability.

The post The librdkafka Supply Chain Breakdown: rdkafka-ruby’s Darkest Hour first appeared on Closer to Code.

Ubuntu Server Admin

Recent Posts

🚀 Deploy Elastic Stack on Ubuntu VPS (5 Minute Quick-Start Guide)

Here’s the guide to deploy Elastic Stack on Ubuntu VPS, with secure access, HTTPS proxying,…

1 day ago

🚀 Deploy Nagios on Ubuntu VPS

This guide walks through deploying Nagios Core on an Ubuntu VPS, from system prep to…

2 days ago

Shoryuken Has a New Maintainer, and v7.0.0 Is Almost There

After a decade under Pablo Cantero's stewardship, Shoryuken has a new maintainer - me. I'm…

5 days ago

A better way to provision NVIDIA BlueField DPUs at scale with MAAS

MAAS 3.7 has been officially released and it includes a bunch of cool new features.…

2 weeks ago

Ruby Floats: When 2.6x Faster Is Actually Slower (and Then Faster Again)

Update: This article originally concluded that Eisel-Lemire wasn't worth it for Ruby. I was wrong.…

2 weeks ago

MicroCeph: why it’s the superior MinIO alternative (and how to use it)

Recently, the team at MinIO moved the open source project into maintenance mode and will…

2 weeks ago