Formatting an entire 25 million line codebase overnight: the rubyfmt story

[-]

oliver_extracts@reddit

the technical part (using .git-blame-ignore-revs) is the easy half. harder problem for most teams is selling the overnight reformat internally. ive seen smaller reformats stall for months because someones tool depends on the existing whitespace, or someones mental model of the file is tied to specific line numbers, or theres a PR queue with hundreds of open PRs that would all need rebasing.

stripe doing this at 25M lines overnight means they probably had a coordination layer most teams underestimate. blog posts make it sound clean, the actual political work to get there is usually months.

[-]

tj-horner@reddit

someone’s mental model of the file is tied to specific line numbers

That’s… hmm. I have many questions

[-]

oliver_extracts@reddit

haha yeah, hard to believe until you've worked with someone who does it. ive worked with engineers who refer to files by line numbers in their head, like "the bug is around line 240 in user_service.rb" and they'll navigate there directly without searching. reformat hits, line 240 is now something else, their mental map breaks and they're noticeably slower for a few weeks until they rebuild it.

more common with vim/emacs users who navigate by line number a lot. less of a thing for vscode-mouse-navigation people. but ive seen it block reformats more than once.

[-]

mahreow@reddit

That's dumb and anyone who blocks a reformat because of it is dead weight and should be fired

[-]

CherryLongjump1989@reddit

In my experience they will get upset if anyone but them tries to modify their file. Also just in my experience, they tend to be attracted to very small but critical niches that require lots of maintenance. But that also makes them more difficult to fire.

[-]

bloodwhore@reddit

This isn't someone you want in your company anyway. Get rid of them ASAP.

[-]

TheAlaskanMailman@reddit

Waitt… so people usually don’t have a mental image of the file and where everything is?
I use vim but editor should be irrelevant here.

How do you guys navigate to “that” part without knowing the shape of the file?

[-]

jxddk@reddit

Also a (Neo)vim user, I'm either navigating through semi-permanent marks (e.g. 'U drops me at the User class in most projects), or by LSP workspace symbols (e.g. fuzzy-finding class Us). I have a rough idea of what line number ranges are interesting but my mental model for the "shape" of the file is very much based on symbols rather than linebreaks.

[-]

CherryLongjump1989@reddit

Most projects have a User class?

[-]

ZorbaTHut@reddit

ctrl-f functionname

Or more likely, ctrl-shift-f functionname. Who needs line numbers when you can just go to the right code, even if it's been moved?

[-]

TheAlaskanMailman@reddit

Yeah. That’s the LSP’s job. But if you just want to go to a specific part directly, that’s what I’m talking about

[-]

BigHandLittleSlap@reddit

Ctrl-click.

[-]

ShinyHappyREM@reddit

more common with vim/emacs users

And perhaps old oldschool BASIC programmers.

[-]

oliver_extracts@reddit

haha for them line numbers literally ARE the syntax. they figured this out decades before the rest of us.

[-]

franklindstallone@reddit

That’s someone you get rid of

[-]

obetu5432@reddit

the real /r/programminghorror was using ruby in the first place, in the year of our Lord 2010+16

[-]

_BreakingGood_@reddit

honestly its hard to make that argument, this is one of the most reliable and robust saas services in existence. arguably the most reliable.

[-]

Freeky@reddit

With enough thrust even a brick is a viable aircraft. Stripe certainly brought the lbf.

[-]

qmunke@reddit

All dynamically typed languages end up having to invent type checking eventually because dynamic typing is a fundamentally unsound idea.

[-]

sidonay@reddit

Counterpoint: it’s cool to hate on web languages that work (PHP, ruby) 😎

(/s btw )

[-]

nNaz@reddit

As a rust dev I find ruby perf terrible but it’s an amazingly elegant and concise language to write in. If we ignore perf, Rails is an excellent MVC framework and a pleasure to develop with. Better devx and APIs than Django and far less boilerplate than fastapi and express.

[-]

paca-vaca@reddit

It's cool, but one moment I don't understand, why the whole codebase wasn't formatted in the first place? CI setup goes from the day 1.

One installs rubocop plugin and runs on save/git precommit/ci as part of the linting process. Such that you don't have to overwrite the whole codebase history by formatting.

[-]

totoro27@reddit

I don't understand, why the whole codebase wasn't formatted in the first place? CI setup goes from the day 1.

Legacy code base.

[-]

chucker23n@reddit

why the whole codebase wasn't formatted in the first place? CI setup goes from the day 1.

First: that's a fantasy.

And second: even if you do have all that set up from the start (management will rightfully ask if there aren't bigger fish to fry), you might want to change some formatting rules. Maybe the linter has new features. Maybe there's been enough rotation/attrition in the team that the new team members no longer agree with some of them.

[-]

sammymammy2@reddit

I don't get the whole ripper thing. Ripper is a Ruby library, right? So parsing is still done by executing Ruby code? Or does Ripper just go straight into the Ruby VM lexer and parser? Because that's what I'd do, run the Ruby VM and gobble up its parsing results, emitting it as pretty-printed code.

[-]

masklinn@reddit

Because that's what I'd do, run the Ruby VM and gobble up its parsing results

That's basically the entirety of the "Rewriting in Rust" section.

[-]

Agreeable-Price8343@reddit

inking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. "linking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. But it worked, and that was enough for now."

this is the right kind of pragmatism. the alternative is someone insisting on a pure rust ruby parser in 2018 and the project never shipping. do the ugly thing first, clean it up when the ecosystem catches up, which is exactly what happened with the prism migration

[-]

masklinn@reddit

TBF that is more or less the origin story of ripper (sans pure ruby, or never shipping), eventually it was merged into the stdlib and ended up integrated directly into the yacc file definition (https://github.com/ruby/ruby/blob/79f9f8326a34e499bb2d84d8282943188b1131bd/parse.y#L1519).

[-]

peripateticman2026@reddit

25 MLOC? Probably 24 MLOC over what's required.

[-]

revereddesecration@reddit

How does it affect the usability of the repo from a history perspective?

I’d be tempted to go deeper and rework the history of the repo such that the previous commits are formatted retroactively. That’s probably too large a job on a codebase of Stripe’s size though.

[-]

sephirostoy@reddit

You can exclude a list of commits to ignore during blame via a .git-blame-ignore-revs

[-]

Schmittfried@reddit

I wish all tools honored it automatically.

[-]

Roang_zero1@reddit

Alternatively you can also ignore revs for tools like blame. I would check in a blame ignore file (see Git - git-blame Documentation).

Some forges like github will also ignore these revs for their views then.

[-]

Agreeable-Price8343@reddit

Claude responded: > "linking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. "linking a full Ruby VM into a Rust binary to walk its parse tree in memory isn't a normal thing to do. But it worked, and that was enough for now."

this is the right kind of pragmatism. the alternative is someone insisting on a pure rust ruby parser in 2018 and the project never shipping. do the ugly thing first, clean it up when the ecosystem catches up, which is exactly what happened with the prism migration