Small PRs, Big Speedups: The Ruby Performance Work You Almost Missed

Normally I just fire off a tweet when I spot a nice performance PR landing in Ruby. Lately I’ve been catching up on a backlog of Ruby performance work I’d bookmarked and never gotten around to – so some of what’s below isn’t brand new, with a few PRs dating back to 2025. There were so many of them – some headline-grabbing, some small but delightfully clever – that a thread won’t cut it. So here’s a roundup instead, both the recent landings and the ones I’m late to.

A few ground rules: every PR below ships a concrete benchmark number, so when I say “Nx faster” it’s the author’s own measurement, not vibes. Numbers come from different machines and workloads, so treat them as “here’s the win on the benchmark that motivated the change,” not cross-comparable lab results. Click through to any PR for the full picture – most authors document their methodology beautifully.

Let’s go.

Table of Contents

Strings & text

String#scrub skips ASCII runs – Instead of decoding a string character-by-character, scrub now jumps over ASCII runs using the same search_nonascii trick valid_encoding? uses. On English HTML it’s up to 45.55x faster, on Japanese HTML 22.71x, and ~3.5x on the general case – with no regression on the worst case. Beautiful work by FletcherDares, who’s been on a string-performance tear.
String#codepoints ASCII hot path – Same author, same instinct: add a local fast path for ASCII bytes inside mostly-ASCII UTF-8 strings. Result: ~1.9x faster on mixed ASCII content, neutral on pure multibyte.
String#gsub! stops copying on no-match – gsub! was eagerly copying shared backing storage even when nothing matched. Defer that copy until the first real match (like sub! already does) and you get 2.33x faster no-match calls – and the allocation on a 100k-char shared string drops from 100,041 bytes to 40 bytes.

Files & directories (the byroot file-IO spree)

byroot (Jean Boussier) went on a tear through Ruby’s file primitives, and the numbers are spicy:

File.join common case – Optimistically handle the common “two UTF-8 strings” case and scan backwards for the separator. Up to 18.81x faster for many-string joins, 7.80x for two strings.
File.extname for common encodings – Skip multibyte handling for known-safe encodings. Up to 6.17x faster on long paths.
File.expand_path single-byte fast path – A single-byte-encoding fast path nets 2.67x faster.
Dir.scan yields entry type – Yield each child’s type straight from struct dirent‘s d_type, avoiding a separate stat per child. Recursive directory walks come out 2.12x faster (“twice as fast”).
dir.c caches the working directory – Cache and cheaply revalidate pwd with a stack buffer instead of always heap-allocating. Up to 1.33x faster Dir.pwd on Linux.

GC & object allocation

Clear page bits in one shot – jhawthorn (John Hawthorn) turned age bits into a bit plane so age + wb_unprotected bits clear for a whole 64-slot page at once during sweep. ~14% off object-new.
Move rb_class_allocate_instance into gc.c – Also jhawthorn: relocating the function lets allocation helpers inline with newobj. ~10–15% faster Object.allocate (1.15x).
Remove the class alloc check – jhawthorn again, demoting a runtime allocation-class check to a debug-only assert and unlocking tail-call optimization. ~10% faster Object.new (1.12x).

Concurrency & core classes

Speed up TypedData_Get_Struct – byroot added an inlinable fast path to rb_check_typeddata, which makes Mutex#synchronize and Monitor#synchronize ~1.54x / ~1.55x faster respectively.
Thread::Queue uses a ring buffer – Swapping the backing array for a ring buffer removes array-function overhead: ~23% faster (1.24x). byroot.
Give the hot thread scheduler priority – jpl-coconut reworked thread switching to avoid an intermediate monitor-thread hop. On a 2-core setup the motivating benchmark went from 1.455s to 0.231s (and a heavier scenario from 36.7s to 4.1s).

Parser & build

Parallelize bundled gem tests – Not a runtime win, but st0012 (Stan Lo) made CI run gem tests through a thread pool tied into the make jobserver, shaving ~40% off that CI step across platforms.
Prism parser optimizations – kddnewton (Kevin Newton) packed in fast/slow path splitting, scope bloom filters, SIMD/SWAR strpbrk, a wyhash word-at-a-time constant pool, and a parser arena. ~22% faster parsing at roughly the same memory. (The matching ruby/ruby side is #16418.)
Optimize the Prism Ruby visitor – Replace the array-allocating compact_child_nodes with an each_child_node that yields directly. Visiting the Rails codebase came out ~21% faster on the interpreter and roughly 2.3x faster under YJIT.
Lazily deserialize DefNode – Defer DefNode deserialization in the Java loader so JRuby/TruffleRuby don’t pay for method bodies up front: ~1.5x faster on the parsing-core metric.

BigDecimal goes brrr

tompng (Tomoya Ishida) has been quietly doing extraordinary things to BigDecimal:

NTT multiplication + Newton-Raphson division – O(n log n) multiplication via a three-prime Number Theoretic Transform. The headline is almost comical: up to 800,000x faster multiplication. A squaring that was estimated at 270 days now runs in 29 seconds. This is the kind of PR you frame on a wall.
Increase VpMult batch size – Bumping the divmod batch from 8 to 16 makes mid-size multiplications ~1.8x faster. tompng.
Optimize BigDecimal#to_s – byroot replaced two snprintf calls with a lean integer-to-ASCII routine: ~2.6x faster for small numbers, ~3.8x for large ones.

JIT corner

Fix RCLASS_EXT_WRITABLE perf – luke-gruber swapped FL_TEST/FL_SET for their _RAW variants, dropping a YJIT getivar benchmark from 60ms to 40ms (~1.5x).
Rewrite Array#find in Ruby – swebb reimplemented Array#find in Ruby so the JIT can chew on it: ~1.96x faster under YJIT, neutral on the interpreter.
ZJIT recompiles getivar on shape-guard failure – k0kubun (Takashi Kokubun) cut guard_shape_failure side exits on the lobsters benchmark from 22.5% down to 3.0%, keeping more code in ZJIT.
Annotate Float predicates – Teaching ZJIT about Float#nan? / finite? / infinite? lets it emit the fast C-call path: ~21–27% faster on those predicates in a tight loop.
ZJIT also keeps growing its instruction set – specializing Method#call and adding an ArrayAset HIR instruction for array element assignment – each shaving a few percent off the relevant wall-clock benchmarks.

Quick hits

A few more that are smaller in scope but very much worth a click – and a thank-you to each author:

Integer#to_s two-digit lookup table – emit two digits per loop iteration; up to ~33% faster on large Fixnums.
NilClass methods moved to Ruby – Hartley McGuire made nil.to_c / to_r JIT-friendly: up to 3.5x faster.
OPTIMIZED_CMP in r_less – speeds up Range#cover? / Range#overlap? by up to ~3x.
Declaring weak references – a new rb_gc_declare_weak_references API trims WeakMap overhead: ~60% faster WeakMap#[]=.
Remove a wasted allocation in BER integer packing – khasinski (Chris Hasiński) shaved ~50% off Array#pack with the 'w' format.
Optimize Lrama – the parser generator gets faster, cutting Ruby’s own parse.y processing from 2.84s to 1.60s (~1.78x).

Closing

If you like performance magic, go read these. And if you maintain a gem, read them twice – a lot of what’s here (back-to-front scanning, single-byte fast paths, deferring copies, avoiding stat) is worth learning from.

Thanks to everyone credited here for the work.

The post Small PRs, big speedups: The Ruby performance work you almost missed appeared first on Closer to Code.

Discover more from Ubuntu-Server.com

Subscribe to get the latest posts sent to your email.

Small PRs, big speedups: The Ruby performance work you almost missed

Strings & text

Files & directories (the byroot file-IO spree)

GC & object allocation

Concurrency & core classes

Parser & build

BigDecimal goes brrr

JIT corner

Quick hits

Closing

Like this:

Related

Discover more from Ubuntu-Server.com

Comments

Leave a Reply Cancel reply

Strings & text

Files & directories (the byroot file-IO spree)

GC & object allocation

Concurrency & core classes

Parser & build

BigDecimal goes brrr

JIT corner

Quick hits

Closing

Share this:

Like this:

Related

Discover more from Ubuntu-Server.com

Comments

Leave a Reply Cancel reply