Update: This article originally concluded that Eisel-Lemire wasn’t worth it for Ruby. I was wrong. After revisiting the problem, I found a way to make it work – and submitted a PR to Ruby. Read the full update at the end.
Recently, I submitted a PR to Ruby that optimizes Float#to_s using the Ryu algorithm, achieving 2-4x performance improvements for float-to-string conversion. While that work deserves its own article, this article is about what happened when I tried to optimize the other direction: string-to-float parsing.
String-to-float seemed like an equally promising target. It’s a fundamental operation used everywhere – parsing JSON, reading configuration files, processing CSV data, and handling user input. Since the Ryu optimization worked so well for float-to-string, surely the reverse direction would yield similar gains?
I did my research. I found a state-of-the-art algorithm backed by an academic paper. I implemented it. All tests passed. It worked exactly as promised.
And then I threw it all away.
The Eisel-Lemire algorithm, published by Daniel Lemire in 2021 in his paper “Number Parsing at a Gigabyte per Second”, looked like exactly what I needed. It’s a modern approach to converting decimal strings to floating-point numbers, using 128-bit multiplication with precomputed powers of 5.
Rust uses it. Go adopted it in 1.16. The fast_float C++ library is built around it.
When two performance-conscious language communities both adopt the same algorithm, you pay attention.
I wrote about 1,100 lines of C: 128-bit multiplication helpers, a ~10KB lookup table for powers of 5, the core algorithm, and a wrapper matching Ruby’s existing strtod interface. For edge cases (hex floats, numbers with more than 19 significant digits, ambiguous rounding), it falls back to the original implementation. In practice, maybe 0.01% of inputs hit the fallback.
All 59 Float tests passed. Round-trip verification worked.
So how much faster was it?
Here’s where I almost made a mistake.
I ran a benchmark with 3 million iterations across various float formats:
| Test Case | Unmodified | Eisel-Lemire | Speedup |
|---|---|---|---|
| Decimal (0.123456789) | 0.185s | 0.186s | 1.00x |
| Scientific notation | 0.162s | 0.182s | 0.89x |
| Math constants (Pi, E) | 0.538s | 0.205s | 2.62x |
| Currency values | 0.155s | 0.167s | 0.93x |
| Coordinates | 0.172s | 0.171s | 1.01x |
| Very small (1e-15) | 0.220s | 0.171s | 1.29x |
| Very large (1e15) | 0.218s | 0.169s | 1.29x |
| TOTAL | 2.316s | 1.948s | 1.19x |
19% faster overall. The math constants case was 2.62x faster. I was ready to open a PR.
But something about the benchmark bothered me. I’d designed it to cover “various float formats” – which sounds reasonable until you realize I was testing what I expected to matter, not what actually matters.
What numbers do Ruby applications actually parse?
Ruby runs web apps, reads config files, processes business data. It’s not crunching scientific datasets. The floats it sees are prices, percentages, coordinates, timeouts. Mostly simple stuff.
So I benchmarked that:
| Test Case | Unmodified | Eisel-Lemire | Change |
|---|---|---|---|
| Single digit (1-9) | 0.236s | 0.255s | -8% |
| Two digits (10-99) | 0.240s | 0.289s | -17% |
| Simple decimal (1.5, 2.0) | 0.244s | 0.281s | -13% |
| Price-like (9.99, 19.95) | 0.258s | 0.272s | -5% |
| Short decimal (0.5, 0.25) | 0.255s | 0.277s | -8% |
| Simple scientific (1e5) | 0.250s | 0.268s | -7% |
| Common short (3.14, 2.71) | 0.253s | 0.264s | -4% |
| TOTAL | 2.482s | 2.710s | -9% |
9% slower on simple numbers. The numbers Ruby actually parses.
Eisel-Lemire has fixed overhead: parse the string, look up powers of 5, do 128-bit multiplication, construct the IEEE 754 double. That overhead pays off when the alternative is expensive.
But Ruby’s existing strtod – based on David Gay’s code from 1991 – has been tuned for 30+ years. It has fast paths for simple inputs like "1.5" or "99.99". For those cases, the old code is already fast. Eisel-Lemire’s setup cost ends up being more expensive than the work it replaces.
The algorithm works exactly as advertised. It just solves a different problem than the one Ruby has, in my opinion.
What if I used strtod for simple numbers and Eisel-Lemire only for complex ones?
| Approach | Total Time | vs Baseline |
|---|---|---|
| Unmodified strtod | 2.316s | baseline |
| Pure Eisel-Lemire | 1.948s | +19% |
| Hybrid (digit threshold=8) | 2.164s | +7% |
| Hybrid (digit threshold=10) | 2.194s | +6% |
| Hybrid (length-based) | 2.060s | +11% |
Any dispatch check adds overhead. Counting digits or checking string length isn’t free. The check itself eats into the gains.
After publishing this article, I decided to revisit the problem. The insight came from re-reading Nigel Tao’s blog post, which mentions that the algorithm includes a “simple case” optimization for small mantissas that can be multiplied exactly by powers of 10.
The key realization: don’t fight strtod on its home turf. Instead of replacing strtod entirely, I added fast paths that intercept simple numbers before they ever reach either algorithm:
"5", "42", "-123" (up to 3 digits) with direct digit parsing"1.5", "9.99", "199.95" (up to 3+3 digits) using precomputed divisorsstrtod – for edge cases (hex floats, >19 digits, ambiguous rounding) The fast paths are trivial – just a few comparisons and arithmetic operations. No 128-bit multiplication, no table lookups. For simple inputs, they’re faster than both strtod and Eisel-Lemire.
After implementing the fast paths, I ran the same benchmarks against Ruby master (3 million iterations):
| Input Type | Master | Optimized | Improvement |
|---|---|---|---|
Simple decimals ("1.5", "3.14") | 0.154s | 0.125s | 19% faster |
Prices ("9.99", "19.95") | 0.155s | 0.125s | 19% faster |
Small integers ("5", "42") | 0.149s | 0.116s | 22% faster |
Math constants ("3.141592653589793") | 0.674s | 0.197s | 3.4x faster |
High precision ("0.123456789012345") | 0.554s | 0.199s | 2.8x faster |
Scientific ("1e5", "2e10") | 0.154s | 0.153s | ~same |
The numbers that were 9% slower are now 19-22% faster. The complex numbers that were 2.6x faster are now 2.8-3.4x faster. No regressions anywhere.
Based on this work, I submitted PR #15655 to Ruby. The implementation adds about 320 lines to object.c plus a 10KB lookup table for powers of 5.
My first benchmark was designed to make me feel good about my work. It covered “various formats” which happened to include cases where Eisel-Lemire shines. Only when I forced myself to benchmark what Ruby actually does did reality show up.
My original conclusion wasn’t wrong – pure Eisel-Lemire is slower for simple numbers. The mistake was treating it as an all-or-nothing choice. Theoretical performance gains are hypotheses. Benchmarks against real workloads are proof. And sometimes the best optimization isn’t replacing an algorithm – it’s knowing when not to run it.
The post Ruby Floats: When 2.6x Faster Is Actually Slower (and Then Faster Again) appeared first on Closer to Code.
AWS’s open source edge run time meets Canonical’s fully containerized OS for devices delivering a…
We recently announced that you can now benefit from the combined power of Ubuntu Core…
This article provides a guide to deploy ScyllaDB on Ubuntu VPS. What is ScyllaDB? ScyllaDB…
This article demonstrates how to harden security of VPS server and is intended for server…
In this blog post, we will guide you on how to install wget on Ubuntu…
Jan 27,2026 Xfwl4 - The roadmap for a Xfce Wayland Compositor We, the Xfce team…