How Canonical Support solves hard Linux performance bugs  – even in 12-year old code

How Canonical Support solves hard Linux performance bugs  – even in 12-year old code

Some support cases are straightforward. Others lead deep into legacy code, where a single logic bug can quietly turn a routine command into a major performance problem. This series looks at how Canonical Support and Sustaining Engineering work together to investigate, patch, and upstream difficult issues that standard troubleshooting alone cannot solve. In this second post, a 12-year-old bug in libnss-db caused getent enumeration to slow to a crawl – and showed how far expert support can go when a customer brings the right evidence and the right question.

How canonical support solves hard linux performance bugs  – even in 12-year old code 1
How canonical support solves hard linux performance bugs  – even in 12-year old code 2

What went wrong

Getent is a standard Linux command used to query the system’s name service for information such as users, groups, hosts, and services. In this case, it was being used to enumerate groups on Ubuntu, and a customer reported severe performance problems when using the `nssdb` backend. The `nssdb` backend is one way Linux can provide that information, by reading account and group data from Berkeley DB files instead of from LDAP, local flat files, or other identity sources. 

In an environment with more than 24,000 user and group entries, enumeration had become so slow that the backend was effectively unusable. The slowdown was severe enough to block practical use of the system for the customer’s workload. The customer had already tested alternatives and found they did not deliver the performance profile they needed.

Starting with the evidence

Solving frequent issues in new packages is hard enough, but this investigation was made harder by something different: the problem lived in an older component that had not been actively touched in more than 12 years . 

The customer had already narrowed the issue to libnss-db: a component behind the nssdb backend, a legacy name service library that implements nssdb. They also pointed to one small but important piece of logic: stayopen.

Reproducing the slowdown

The Canonical support engineer soon determined that stayopen handling was the likely source. Stayopen is a flag that determines whether the database connection remains open across repeated lookups or is reopened for each operation.

See also  Install Wireshark on Ubuntu 22.04

The support engineer reproduced the issue and confirmed that the performance degradation was both real and severe. However, what initially looked like a generic lookup slowdown turned out to be repeated database activity during enumeration, with the cost of each operation compounding across a very large directory. At that scale, the result was a system that could no longer complete the work in a reasonable time.

To truly understand the problem, the next step required a closer inspection of the software’s source code. That result shifted the investigation from surface-level troubleshooting to source-level analysis. Solving the problem meant examining libnss-db itself: an independent package whose C source had remained largely unchanged for more than a decade.

Digging into legacy C code

At that point, the work became a code-level deep dive for our Support team. Our engineer traced how libnss-db handled Berkeley DB access during enumeration and followed the control flow through code that, in some places, was roughly 12 years old.

Our engineer soon found the source of all the problems: a logic bug in how the library handled database connections during enumeration. This was not the kind of issue that can be solved with a quick setting change or a routine package update. The library was repeatedly opening and closing the database files instead of keeping the connection open throughout the enumeration sequence.

The line of code that mattered

That open-close cycle had a dramatic effect at scale. During a single enumeration, it triggered 48,422 repeated disk reads, creating a major performance bottleneck and slowing the system far beyond what the customer could accept.  The cost was large enough to overwhelm the lookup path entirely.

See also  Karafka Framework 2.3 + Web UI 0.8 Release Announcement

The customer’s suspicion about stayopen turned out to be exactly right. Once that behavior was confirmed in the code, our engineer created a patch to force the database connection to remain open during enumeration.

How performance was restored

The fix improved the lookup behavior immediately. By keeping the database open across the full enumeration sequence, it eliminated the repeated disk reads and restored performance for the customer.

After validating the patch provided by a member of our Support team, the case was escalated to Canonical Sustaining Engineering so the issue could be tracked formally and moved toward a broader fix. A Launchpad bug was then created to document the problem and propose the change upstream, including for newer Ubuntu releases such as Noble.

Why this case stands out

This case is a good example of what technical support looks like when the answer is deeper than a package upgrade, a configuration change, or a standard workaround. Those are usually faster options because they can solve problems without changing the software itself. Here, however, the issue lived in the code, so a deeper investigation was required. The resolution depended on careful reproduction, close collaboration with a technically knowledgeable customer, and a willingness to read through old source code until the underlying behavior was fully understood.

It also highlights the practical value of long-term support. Even though this issue lived in an old component that had long since fallen outside the attention of most of the broader community, it was still possible to investigate, patch, and escalate it because the customer had an Ubuntu Pro with Support subscription. That combination of long-term security maintenance and direct engineering expertise made it possible to solve a problem that otherwise might have remained unresolved.

See also  How to disable the screen lock in ubuntu

If you’re looking to understand how Canonical Support works, or explore the value, stability, and reliability it can bring to your organization and systems, we recommend you visit our dedicated Support page.

And if you’re looking for help with a custom project, or just want to find out what your available support options are, please don’t hesitate to contact us

More from this series

When an upstream change broke smartcard FIPS authentication and how we fixed it 


Discover more from Ubuntu-Server.com

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

    Leave a Reply