We all make mistakes, and fundamentally, the havoc caused by this incident was due to a flaw in the design of rdkafka-ruby. While the disappearance of librdkafka from GitHub was unexpected, this article aims to clarify and explain how rdkafka-ruby should have prevented it and what was poorly designed. By examining this incident, I hope to provide insights into better practices for managing dependencies and ensuring more resilient software builds for the Ruby ecosystem.
On July 10, 2024 15:47 UTC, users of the rdkafka gem faced issues when the librdkafka repository on GitHub unexpectedly went private. This break in the supply chain disrupted installations, causing widespread frustration and, in many cases, completely blocking the ability to deploy rdkafka-based software.
Fetching rdkafka 0.16.0
Installing rdkafka 0.16.0 with native extensions
Gem::Ext::BuildError: ERROR: Failed to build gem native extension.
current directory: /rdkafka-0.16.0/ext
/usr/local/bin/ruby -rrubygems
/rake-13.2.1/exe/rake
RUBYARCHDIR=/home/circleci/.rubygems/extensions/x86_64-linux/3.3.0/rdkafka-0.16.0
RUBYLIBDIR=/home/circleci/.rubygems/extensions/x86_64-linux/3.3.0/rdkafka-0.16.0
2 retrie(s) left for v2.4.0 (404 Not Found)
1 retrie(s) left for v2.4.0 (404 Not Found)
0 retrie(s) left for v2.4.0 (404 Not Found)
404 Not Found
rake aborted!
Errno::ENOENT: No such file or directory @ rb_sysopen - ports/archives/v2.4.0
(Errno::ENOENT)
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:496:in
`verify_file'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:133:in
`block in download'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:131:in
`each'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:131:in
`download'
/mini_portile2-2.8.7/lib/mini_portile2/mini_portile.rb:232:in
`cook'
/rdkafka-0.16.0/ext/Rakefile:38:in `block
in '
/rake-13.2.1/exe/rake:27:in `'
Tasks: TOP => default
(See full trace by running task with --trace) The rdkafka gem used to rely on downloading librdkafka from the Confluent GitHub repository during the installation process. As a huge proponent of immutable builds that do not depend on external resources, I planned to change this model for a long time. Several months ago, I created a GitHub issue to address this transition. However, the change was delayed due to other priorities within the karafka ecosystem. Unfortunately, this delay resulted in the recent outage.
# Just the relevant code here
recipe.files << {
:url => "https://codeload.github.com/edenhill/librdkafka/tar.gz/v#{Rdkafka::LIBRDKAFKA_VERSION}",
:sha256 => Rdkafka::LIBRDKAFKA_SOURCE_SHA256
}
recipe.configure_options = ["--host=#{recipe.host}"]
recipe.cook This setup meant that during the bundle install process, the required librdkafka source was fetched and compiled on the fly, which inherently relied on the availability of the external GitHub repository.
Upon discovery, it took me 59 minutes to release the first patched version and approximately four hours to prepare fixes and backport them to all relevant versions of the rdkafka gem, including older ones. Luckily, I was in front of my computer when the incident occurred, allowing me to quickly create and release needed fixes.
Going forward, all future releases will depend only on RubyGems, ensuring no reliance on external build sources like GitHub. I decided to ship the librdkafka releases inside the gem itself, enhancing its reliability and stability of the ecosystem.
releases = File.expand_path(File.join(File.dirname(__FILE__), '../dist'))
recipe.files << {
:url => "file://#{releases}/librdkafka_#{Rdkafka::LIBRDKAFKA_VERSION}.tar.gz",
:sha256 => Rdkafka::LIBRDKAFKA_SOURCE_SHA256
}
recipe.configure_options = ["--host=#{recipe.host}"]
recipe.cook This incident highlights our dependence on other OSS projects and repositories. It’s essential to remember that mistakes can happen, and we must be prepared. This wasn’t the first issue with GitHub downloads. In 2023, a change in GitHub’s tar layout broke a lot of software, including ours, that relied on checksums for artifacts verification. To be honest, if we had migrated the building process of rdkafka at that time, this article would not have to be written.
Here are my main takeaways from this incident:
The post The librdkafka Supply Chain Breakdown: rdkafka-ruby’s Darkest Hour first appeared on Closer to Code.
Here’s the guide to deploy Elastic Stack on Ubuntu VPS, with secure access, HTTPS proxying,…
This guide walks through deploying Nagios Core on an Ubuntu VPS, from system prep to…
After a decade under Pablo Cantero's stewardship, Shoryuken has a new maintainer - me. I'm…
MAAS 3.7 has been officially released and it includes a bunch of cool new features.…
Update: This article originally concluded that Eisel-Lemire wasn't worth it for Ruby. I was wrong.…
Recently, the team at MinIO moved the open source project into maintenance mode and will…