Small PRs, big speedups: The Ruby performance work you almost missed

Normally I just fire off a tweet when I spot a nice performance PR landing in Ruby. Lately I’ve been catching up on a backlog of Ruby performance work I’d bookmarked and never gotten around to – so some of what’s below isn’t brand new, with a few PRs dating back to 2025. There were so many of them – some headline-grabbing, some small but delightfully clever – that a thread won’t cut it. So here’s a roundup instead, both the recent landings and the ones I’m late to.

A few ground rules: every PR below ships a concrete benchmark number, so when I say “Nx faster” it’s the author’s own measurement, not vibes. Numbers come from different machines and workloads, so treat them as “here’s the win on the benchmark that motivated the change,” not cross-comparable lab results. Click through to any PR for the full picture – most authors document their methodology beautifully.

Let’s go.

Strings & text

  • String#scrub skips ASCII runs – Instead of decoding a string character-by-character, scrub now jumps over ASCII runs using the same search_nonascii trick valid_encoding? uses. On English HTML it’s up to 45.55x faster, on Japanese HTML 22.71x, and ~3.5x on the general case – with no regression on the worst case. Beautiful work by FletcherDares, who’s been on a string-performance tear.
  • String#codepoints ASCII hot path – Same author, same instinct: add a local fast path for ASCII bytes inside mostly-ASCII UTF-8 strings. Result: ~1.9x faster on mixed ASCII content, neutral on pure multibyte.
  • String#gsub! stops copying on no-matchgsub! was eagerly copying shared backing storage even when nothing matched. Defer that copy until the first real match (like sub! already does) and you get 2.33x faster no-match calls – and the allocation on a 100k-char shared string drops from 100,041 bytes to 40 bytes.
See also  How To Install Zabbix Server 7.2 On Ubuntu 22.04

Files & directories (the byroot file-IO spree)

byroot (Jean Boussier) went on a tear through Ruby’s file primitives, and the numbers are spicy:

  • File.join common case – Optimistically handle the common “two UTF-8 strings” case and scan backwards for the separator. Up to 18.81x faster for many-string joins, 7.80x for two strings.
  • File.extname for common encodings – Skip multibyte handling for known-safe encodings. Up to 6.17x faster on long paths.
  • File.expand_path single-byte fast path – A single-byte-encoding fast path nets 2.67x faster.
  • Dir.scan yields entry type – Yield each child’s type straight from struct dirent‘s d_type, avoiding a separate stat per child. Recursive directory walks come out 2.12x faster (“twice as fast”).
  • dir.c caches the working directory – Cache and cheaply revalidate pwd with a stack buffer instead of always heap-allocating. Up to 1.33x faster Dir.pwd on Linux.

GC & object allocation

  • Clear page bits in one shot – jhawthorn (John Hawthorn) turned age bits into a bit plane so age + wb_unprotected bits clear for a whole 64-slot page at once during sweep. ~14% off object-new.
  • Move rb_class_allocate_instance into gc.c – Also jhawthorn: relocating the function lets allocation helpers inline with newobj. ~10–15% faster Object.allocate (1.15x).
  • Remove the class alloc check – jhawthorn again, demoting a runtime allocation-class check to a debug-only assert and unlocking tail-call optimization. ~10% faster Object.new (1.12x).

Concurrency & core classes

  • Speed up TypedData_Get_Struct – byroot added an inlinable fast path to rb_check_typeddata, which makes Mutex#synchronize and Monitor#synchronize ~1.54x / ~1.55x faster respectively.
  • Thread::Queue uses a ring buffer – Swapping the backing array for a ring buffer removes array-function overhead: ~23% faster (1.24x). byroot.
  • Give the hot thread scheduler priority – jpl-coconut reworked thread switching to avoid an intermediate monitor-thread hop. On a 2-core setup the motivating benchmark went from 1.455s to 0.231s (and a heavier scenario from 36.7s to 4.1s).

Parser & build

  • Parallelize bundled gem tests – Not a runtime win, but st0012 (Stan Lo) made CI run gem tests through a thread pool tied into the make jobserver, shaving ~40% off that CI step across platforms.
  • Prism parser optimizations – kddnewton (Kevin Newton) packed in fast/slow path splitting, scope bloom filters, SIMD/SWAR strpbrk, a wyhash word-at-a-time constant pool, and a parser arena. ~22% faster parsing at roughly the same memory. (The matching ruby/ruby side is #16418.)
  • Optimize the Prism Ruby visitor – Replace the array-allocating compact_child_nodes with an each_child_node that yields directly. Visiting the Rails codebase came out ~21% faster on the interpreter and roughly 2.3x faster under YJIT.
  • Lazily deserialize DefNode – Defer DefNode deserialization in the Java loader so JRuby/TruffleRuby don’t pay for method bodies up front: ~1.5x faster on the parsing-core metric.
See also  Diátaxis, a new foundation for Canonical documentation

BigDecimal goes brrr

tompng (Tomoya Ishida) has been quietly doing extraordinary things to BigDecimal:

  • NTT multiplication + Newton-Raphson division – O(n log n) multiplication via a three-prime Number Theoretic Transform. The headline is almost comical: up to 800,000x faster multiplication. A squaring that was estimated at 270 days now runs in 29 seconds. This is the kind of PR you frame on a wall.
  • Increase VpMult batch size – Bumping the divmod batch from 8 to 16 makes mid-size multiplications ~1.8x faster. tompng.
  • Optimize BigDecimal#to_s – byroot replaced two snprintf calls with a lean integer-to-ASCII routine: ~2.6x faster for small numbers, ~3.8x for large ones.

JIT corner

Quick hits

A few more that are smaller in scope but very much worth a click – and a thank-you to each author:

See also  Canonical announces the availability of Real-time Ubuntu for Amazon EKS Anywhere

Closing

If you like performance magic, go read these. And if you maintain a gem, read them twice – a lot of what’s here (back-to-front scanning, single-byte fast paths, deferring copies, avoiding stat) is worth learning from.

Thanks to everyone credited here for the work.

The post Small PRs, big speedups: The Ruby performance work you almost missed appeared first on Closer to Code.


Discover more from Ubuntu-Server.com

Subscribe to get the latest posts sent to your email.

Comments

No comments yet. Why don’t you start the discussion?

    Leave a Reply